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1 .   INTRODUCTION 

Decision  tables  have  been  widely  accepted  as  a  convenient 
technique  for  specifying  complex  logical  relationships  in  such  diverse 
computer  application  areas  as  data  processing  in  census  studies,  process 
control  in  manufacturing,  management  information  processing  systems,  and 
so  on  (see,  e.g.,  McDaniel  [17],  [18],  [19]  or  Pollack  et  al,  [2k]). 
A  key  problem  for  the  successful  use  of  decision  tables  is  how  to  process 
them  efficiently  on  a  computer.   One  possible  way  is  to  convert  (pre- 
process,  translate  or  compile)  a  decision  table  into  a  special  kind  of 
flowchart,  known  as  a  decision  tree.  A  simple  but  reasonable  measure 
of  the  complexity  of  such  a  decision  tree,  is  the  number  of  its  internal 
nodes,  i.e.,  the  number  of  decision  boxes  which  appear  in  it.  This  is 
a  special  case  of  more  elaborate  measures  of  the  memory  requirement  and 
the  average  processing  time  of  decision  trees. 

Several  methods  have  been  proposed  in  the  literature  for 
converting  decision  tables  into  decision  trees  with  the  objectives  to 
minimize  these  two  measures  of  the  complexity.  Some  of  these,  intended 
primarily  as  manual  methods  (Egler  [2],  Press  [25],  and  Pollack  [23]) 
are  based  on  plausible  arguments,  but  with  little  theoretical  background. 
Others, such  as  the  use  of  the  branch-and-bound  method  (Reinwald  and 
Soland  [26], [27])  are  very  general  in  nature,  and  might  be  improved  with 
more  knowledge  of  the  specific  structure  of  decision  trees.  A  recent 
thesis  by  Garey  [3]  investigates  such  problems  which  are  concerned  with 


the  structure  of  optimal  decision  trees.  However,  this  structure  is 
still  not  sufficiently  well  understood  to  be  able  to  design  efficient 
algorithms  for  the  construction  of  optimal  decision  trees.  Hence, 
further  investigations  of  this  topic  are  appropriate. 

This  thesis  is  devoted  to  theoretical  investigations  of  decision 
table  conversion  problems.  For  this  purpose  we  present  a  simplified 
model  of  the  optimization  problem.  A  special  kind  of  partition  of  the 
set  of  2  vertices  of  an  n-cube  is  considered  as  a  model  of  a  decision 
table  with  n  condition  rows.  The  problem  of  converting  such  an  n-cube 
partition  into  a  binary  decision  tree  is  discussed,  based  mainly  on  the 
simplified  objective  function  mentioned  above. 

The  structure  of  this  thesis  is  as  follows.   In  Chapter  2, 
decision  tables  and  their  conversion  problems  are  briefly  described 
using  conventional  terminology.  Then,  in  Chapter  3,    we  present 
mathematical  concepts  and  notations  which  appear  throughout  the  remainder 
of  this  thesis.  The  next  two  chapters  are  the  main  body  of  this  thesis. 

In  Chapter  k  we  describe  a  procedure,  called  Procedure  R,  to 
construct  decision  trees  for  a  given  n-cube  partition,  and  based  on  this 
procedure,  we  propose  an  algorithm,  "iterated  local  minimization".   It  does 
not  always  yield  optimal  solutions,  but  generates  suboptimal  trees.   The 
resultant  decision  trees  are  compared  quantitatively  with  the  trees 
constructed  by  Pollack's  algorithm  (Pollack  [23])  and  with  optimal 


decision  trees.  This  chapter  also  contains  an  analysis  of  "rule -splits", 
in  particular,  lower  and  upper  bounds  for  the  minimum  number  of  required 
rule-splits  over  all  n-cube  rule  partitions.  Since  a  decision  tree  which 
is  optimal  with  respect  to  one  objective  function  is  not  necessarily 
optimal  for  other  objective  functions,  relationships  existing  among 
optimal  decision  trees  under  different  objective  functions  are  also 
studied  in  the  chapter . 

Chapter  5  i-s  another  major  divison.   The  entire  chapter  is 
devoted  to  a  presentation  of  the  new  topic,  a  decomposition  theory  of 
decision  tables  and  decision  trees.  Recent  development  of  multiprocessor 
systems  or  parallel  computers  provide  the  motivation  behind  the  study.  We 
consider  decomposing  decision  tables  or  decision  trees  into  smaller  ones 
so  that  they  can  be  processed  effectively  in  parallel.  After  a 
theoretical  analysis  of  decomposition,  we  propose  a  procedure,  called 
Procedure  D,  to  construct  a  pair  of  decision  trees  from  a  given  n-cube 
partition.  Based  on  this  procedure,  a  heuristic  algorithm  is  shown. 

Appendix  contains  a  short  summary  of  some  of  the  main 
contributions  to  our  topic. 


2.  DECISION  TABLES  AND  THEIR  CONVERSION  PROBLEMS 

2.1.   Decision  Tables 

Although  flowcharts  are  a  -widely  accepted  means  of  describing 
the  logic  of  computer  programs,  they  have  several  significant  dis- 
advantages which  should  encourage  analysts  to  seek  alternate  methods 
for  stating  the  pertinent  aspects  of  a  program.  Decision  tables 
provide  such  an  alternative.  First  some  of  the  disadvantages  of  the 
flowcharting  techniques  are  listed: 

1.  Although  flowcharts  are  often  very  appropriate  for 
describing  scientific  programs  where  each  box  can  represent  a  certain 
amount  of  computation,  they  are  often  not  appropriate  for  system 
programs,  business  data  processing  or  information  retrieval,  where 

a  long  sequence  of  logical  decisions  must  be  made. 

2 .  Flowcharts  for  complex  problems  tend  to  become  lengthy  and 
difficult  to  follow  and  modify. 

Decision  tables  tend  to  overcome  these  disadvantages  while 
providing  some  advantages  as  well. 


DOES  HE  HAVE  A  GOOD  DRIVING  RECORD  ? 

Y 

Y 

Y 

N 

N 

N 

IS  HE  OVER  25  YEARS  OF  AGE  ? 

Y 

N 

N 

Y 

Y 

N 

IS  HE  MARRIED  ? 

- 

Y 

N 

Y 

N 

- 

INSURE 

X 

X 

X 

X 

- 

- 

CHARGE  RISK  RATE 

- 

- 

X 

X 

- 

- 

REJECT  APPLICATION 

- 

- 

- 

- 

X 

X 

TABLE  2.1. 


Table  2.1.  is  an  example  of  a  decision  table  of  driver 
insurability.   It  has  four  major  sections.  The  condition  stub  is  the 
upper  left  quadrant  and  contains  descriptions  of  conditions  on  which 
decisions  are  to  be  based.  Conditions  are  usually  represented  as 
questions.  The  action  stub  occupies  the  lower  left  quadrant  and 
supplies  all  possible  actions  for  the  conditions  listed  above.  The 
condition  entry  section  is  found  in  the  upper  right  quadrant  and 
answers  the  questions  found  in  the  condition  stub.  All  possible 
combinations  of  answers  to  the  questions  are  formed  here  where  the 
responses  are  restricted  to  "Y"  to  indicate  "Yes"  and  "N"  to  indicate 
"No".   If  no  response  is  indicated,  then  the  response  need  not  be 
checked  for  that  particular  question  and  "-"  (dash)  is  written  there. 
The  action  entry  is  the  remaining  quadrant  of  the  table  and  indicates 


the  appropriate  actions  resulting  from  the  conditions  above.  The 
only  permissible  entry  here  is  the  "X"  to  indicate  "take  this  action". 
One  or  more  actions  may  be  designated  for  each  combination  of  responses. 
Each  of  the  various  combinations  of  responses  along  with  the  indicated 
actions  for  that  combination  is  called  a  rule .  The  various  rules  are 
usually  numbered  or  lettered  for  identification  purposes.  Below  we  list 
some  advantages  of  using  decision  tables. 

1.  Logic  is  stated  precisely  and  compactly. 

2.  Complex  logic  is  easier  to  understand,  and  the  relation- 
ship among  variables  is  readily  understood. 

3«  Decision  tables  lend  themselves  to  update  and  change. 

k.     The  tables  are  appropriate  for  independent  review  and 
documentation . 

Here  we  refer  to  the  case  of  a  large-scale  decision  table 
implementation  for  the  Census  of  Agriculture  (196*0  (McDaniel  [l8])- 

A  questionnaire  with  335  questions  was  sent  to  each  farm  over 
3,100  counties.  Then,  after  600  to  800  items  were  tabulated,  a  decision 
table  of  600  pages  was  made.   It  generated  20,000  to  25,000  lines  of 
code  and  needed  3,000  hours  of  UNIVAC  1107  computer  time.   In  total, 
53  man-years  of  programming  with  1^  man-years  on  the  edit  program 
were  required. 

Problems  such  as  these  stimulated  the  development  of  programming 
languages  which  are  based  on  decision  tables.  Some  of  these  are  listed 
in  chronological  order  of  their  development  (see,  e.g.,  McDaniel  [19]): 


TABSOL--an  experimental  tabular  language  for  GE  machines, 

LOGTAB  for  the  IBM  70^  and  FORTAB--an  extension  of  FORTRAN 
with  a  quite  extensive  and  sophisticated  decision 
table  facility, 

DETAB-X--a  COBOL  oriented  decision  table  language,  and 

DETAB-65--a  further  development  of  DETAB-X. 

Associated  with  the  development  of  these  languages,  there 
arose  a  major  problem  of  algorithms  which  will  compile  decision  tables 
into  efficient  programs.   This  problem  is  the  motivation  and  practical 
background  for  our  theoretical  study. 

2 .2 .   Conversion  Problem 

One  important  way  to  process  a  decision  table  by  a  computer 
is  to  transform  it  into  a  special  kind  of  flowchart  known  as  a  decision 
tree.    Since  there  are  usually  many  decision  trees  which  correspond 
to  a  given  decision  table,  the  problem  arises  of  finding  the  ones  which 
are  most  efficient  according  to  some  criteria  of  optimality.  Consider 
as  an  example  Table  2.2.   It  is  a  simplified  decision  table  in  that  the 
action  stub  and  entry  are  not  presented. 


cl 

Y 

Y 

Y 

N 

C2 

Y 

- 

N 

- 

°3 

N 

Y 

N 

- 

GO  TO 

% 

R2 

R3 

\ 

TABLE  2.2. 
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This  decision  table  can  be  realized  by  each  of  the  decision  trees  shown 
in  Figures  2.1. (a).,  2.1.(b).,  and  2.1. (c).  among  others. 


R. 


R, 


FIGURE  2.1. (a). 


h  Ri 

FIGURE  2.1. (b). 


R3  \ 


H2  \  \     \ 


R, 


FIGURE  2.1. (c). 


They  suggest  some  significant  points  about  optimizing  the 
conversion  of  the  table  into  a  decision  tree.  These  three  trees  embody 
the  same  logical  consequences  but  differ  in  the  procedures  they  specify 
for  arriving  at  the  rules .  They  are  not  equally  good  from  the  viewpoint 
of  memory  requirement  and  processing  time.  For  example,  the  tree  (c) 
always  requires  that  all  three  conditions  be  evaluated,  while  for  the 
trees  (a)  and  (b)  the  number  of  conditions  to  be  evaluated  is  sometimes 
fewer.  To  characterize  this  problem  the  following  quantities  are 
introduced: 

s.;  memory  space  required  for  condition  c. , 
t.  ;   time  required  for  processing  condition  c,  and 
p.;  probability  with  which  the  j-th  rule,  R.,  occurs. 
Then  the  total  memory  requirements,  S,  and  the  average 
processing  time,  T,  for  each  of  the  decision  trees  can  be  calculted 
as  follows : 

(a)  Sa  =  s±  +   s2  +  s3 

Ta  =  (pl  +  ^'^l  +  ^  +  t3)  +  V2'(\  +   t3)  +  P)4  '   t1 

(b)  Sb  =  2s1  +  s2  +  s 

Tb  =  (p1  +  P3)'(t1  +  t2  +  t3)  +  (p2  +  P^)^^  +  t  ) 

(c)  Sc  =  ks1   +  s2  +  2s 


Tc  -  (tx  +  t2  +  t  ) 
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The  expression  for  T  is  a  generalization  of  a  quantity 
known  as  the  weighted  external  path  length  of  a  "binary  tree  (see,  e.g., 
Knuth  ClU]).   It  becomes  identical  to  this  quantity  if 

t.,  =  tn   =  t0  =  1.  Since  S  <  S,  <  S  and  T  <  T,  <  T  ,  for  all  positive 

123  a    b    c      aoc 

values  s.,  t.  and  p.,    E  p .  =  1,  hold,  (a)  is  the  best  of  the  three 
11      J  all  j  J 

decision  trees  with  respects  to  both  memory  requirement  and  processing 
time.  However,  in  general,  a  tree  which  is  optimal  in  one  respect  need 
not  be  optimal  in  another . 

2.3-   Cubic  Representation  of  Decision  Tables 

The  purpose  of  this  section  is  to  introduce  a  representation 
of  a  decision  table  by  an  n-dimensional  cube.  By  using  this  cubic 
representation  we  can  give  a  more  intuitive  interpretation  to  several 
procedures  for  generating  decision  trees,  as  well  as  a  mathematically 
more  precise  formulation.  The  mathematical  terminology  and  notation  used 
in  this  thesis  is  introduced  in  the  next  chapter.  Here  we  discuss  the 
cubic  representation  of  a  decision  table  at  an  intuitive  level. 

Let  us  consider  Table  2.2.  again.   If  we  replace  Y  and  N  by 
1  and  0,  respectively,  then  a  rule,  say  R  ,  becomes  the  triple  (1, -,l). 
Similarly,  R^  Rg  and  R,  are  (1,1,0),  (1,0,0)  and  (0,-,-),  respectively. 
Such  triples  can  be  identified  with  vertices  or  certain  sets  of  vertices 
(namely,  subcubes)  of  the  3-dimensional  cube  as  shown  in  Figure  2.2. 
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j  101 


FIGURE  2.2. 


Note  that  the  conditions  C.  correspond  to  the  coordinate 
axes  of  the  cube.   R  and  R  each  correspond  to  a  vertex,  or  0-cube,  R  to 
an  edge,  or  1-cube,  and  R.  to  a  face,  or  2-cube.   Thus  a  decision  table 
with  n  conditions  can  be  represented  by  a  partition  (of  a  special  kind)  of 
the  set  of  vertices  of  an  n-cube. 

Now  we  explain  how  a  decision  tree  can  be  obtained  from  the 
cubic  representation.   The  decision  tree  of  Figure  2.1. (a),  is  taken  as  an 
example.   Correspondingly,  the  procedure  is  illustrated  in  Figure  2.3- (a). 
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FIGURE  2.3. (a). 


R. 


R, 


The  process  starts  by  separating  the  3-cube  into  two  2-cubes 
by  removing  the  coordinate  C.  .  Then  each  2 -cube  can  be  again  separated 
into  1-cubes  by  removing  a  coordinate.   This  process  continues  until 
each  separated  cube  consists  of  exactly  one  rule,  that  is,  until  every 

rule  is  identified. 
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Let  us  consider  another  example,  Figure  2.1.(b).   Its 
corresponding  process  can  be  illustrated  as  in  Figure  2. 3 -(b). 


R. 


R, 


FIGURE  2.3. (b). 


Ik 


This  case  creates  a  somewhat  different  situation.  That  is, 
when  C„  is  taken  first,  the  2-cube  R.  is  split  into  two  1-cubes,  R£ 
and  Rj".  They  are  the  same  rule  but  are  separately  identified  at  the 
terminals  of  the  decision  trees.  Such  a  "rule- split"  (i.e.,  a  cube 
consisting  of  a  single  rule  is  separated)  increases  the  number  of  nodes 
of  the  resulting  decision  tree,  i.e.,  the  number  of  condition  boxes 
which  appears  in  the  tree  as  well  as  the  number  of  terminal  nodes,  which 
represent  cases  distinguished  by  the  tree.  Such  rule-splits  occurred 
in  both  of  the  trees  in  Figures  2.1.  (b).  and  2.1.(c). 

Before  we  develop  this  cubic  representation  in  the  succeeding 
chapters,  we  give  some  remarks  about  such  conventional  terms  as  Else-rule, 
redundancy  or  contradiction  of  a  decision  table  (see,  e.g.,  Pollack  [2k] ) . 

(1)  In  general,  the  set  of  all  rules  of  a  decision  table 
does  not  cover  its  associated  cube  completely.  This  means  that  some 
vertices  of  the  n-cube  are  left  unspecified  and  no  action  is  taken  for 
these  unspecified  vertices.  We  call  the  set  of  these  vertices  the 
Else-rule.  The  Else-rule  is  not  necessarily  a  single  subcube  as  is  an 
ordinary  rule . 

(2)  It  may  also  occur  that,  for  a  given  decision  table,  two 

or  more  different  rules  are  assigned  to  the  same  vertex.   In  other  words, 
they  overlap  at  that  vertex.   If  the  series  of  actions  taken  for  these 
rules  are  the  same,  redundancy  exists.  If  this  is  not  the  case,  however, 
then,  there  are  different  series  of  actions  for  that  vertex.  This  is 
called  contradiction . 
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In  this  thesis,  we  consider  neither  the  case  of  the  Else-rule  nor 
the  case  of  redundancy  or  contradiction  in  a  decision  table,  but  only  the 
case  where  the  set  of  all  vertices  is  partitioned  into  a  set  of  disjoint 
subcubes.   These  restrictions  are  natural  since  a  decision  tree  whose  in- 
ternal nodes  correspond  to  single  conditions  C.  (i.e.,  not  logical  combina- 
tions of  conditions  C.  such  as  C.  V  C.  '  C.  )  can  realize  only  such  a  special 

1  1    j    k/  "        * 

type  of  partition  of  all  vertices. 
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3.   MATHEMATICAL  PRELIMINARIES 

3-1.   Introduction 

In  this  chapter  we  first  review  some  "basic  algebraic  concepts 
concerning  the  lattice  of  partitions  of  a  set,  and  then  some  additional 
terminology  and  notations  are  introduced.   By  using  these,  we  put  deci- 
sion table  problems  into  a  more  abstract  and  simplified  form  for  their 
theoretical  developments  in  succeeding  chapters.   Particularly,  n-cube 
partitions  and  decision  trees  will  be  used  instead  of  decision  tables 
and  flowcharts,  respectively. 

Among  those  introduced  in  this  chapter,  two  concepts  are  es- 
pecially basic  and  important  to  understand  this  thesis.  One  of  these 
is  inequality  between  two  cube  partitions,  and  the  other  is  multipli- 
cation of  cube  partitions.  The  former  concept  will  be  used  often  for 
describing  the  conversion  procedure  from  cube  partitions  into  decision 
trees  in  Chapter  k,  while  the  latter  will  play  an  essential  role  in 
the  decomposition  theory  of  decision  tables  presented  in  Chapter  5. 

In  the  last  section  of  this  chapter,  we  present  a  preliminary 
study  of  some  sets  of  n-cube  partitions  and  show  that  each  set  forms  a 
lattice  with  the  introduction  of  two  operations.  More  advanced  study 
of  this  lattice  in  relation  to  optimality  discussion  is  presented  in 
Chapter  h,   however. 
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3-2.   Algebraic  Foundations 

This  entire  section  is  excerpted  from  the  book  by  Hartmanis  and 

Stearns  [5]. 

A  relation  between  a  set  S  and  a  set  T  is  a  subset  R  of  S  X  T 

and  for  (s,t)  in  R  we  write  s  R  t.   Thus, 

R  =  {  (s,t)  |  s  R  t  }. 
A  relation  R  on  S  x  S  is: 

reflexive  if,  for  all  s,  s  R  s; 

symmetric  if  s  R  t  implies  t  R  s; 

transitive  if  s  R  t  and  t  R  u  implies  s  R  u. 
A  relation  R  on  S  is  an  equivalence  relation  on  S  if  R  is  reflexive, 
symmetric  and  transitive. 

If  R  is  an  equivalence  relation  on  S,  then  for  every  s  in  S, 
the  set 

BR  (s)  =  {  t  |  s  R  t  } 
is  an  equivalence  class  (i.e.,  the  equivalence  class  defined  by  "s".) 

A  partition  tt  on  S  is  a  collection  of  disjoint  subsets  of  S 
whose  set  union  is  S,  i.e., 

*   =    {B  } 

a 
such  that 

B  n  B„  =  0   for  a  I   p 

and 

U  {B  }  =  S. 
a 

We  refer  to  the  sets  of  it  as  blocks  of  tt  and  designate  the  block 
which  contains  s  by 


18 


Bn(s). 

We  write 

s  =   t(n) 
if  and  only  if  s  and  t  are  contained  in  the  same  block  of  n, 
Note  that  s  =  t(it)  if  and  only  if  B  (s)  =  B  (t). 
A  binary  relation  R  on  S  is  a  partial  ordering  of  S  if  and  only  if  R  is 

(i)   reflexive:  s  R  s  for  all  s  in  S, 

(ii)   antisymmetric:  s  R  t  and  t  R  s  implies  t  sb  s, 

(iii)  transitive:  sRt,  t  R  u  implies  s  R  u. 

We  refer  to  a  set  S  with  a  given  partial  ordering  R  as  a  par- 
tially ordered  set.  When  a  relation  R  is  a  partial  ordering,  we  use 
the  more  suggestive' symbol  "  <  "  instead  of  R,  and  the  partially  ordered 
set  is  represented  by  the  pair  (S,  <)« 

Let  (S,  <)  be  a  partially  ordered  set  and  T  be  a  subset  of  S. 
Then  s  (in  S)  is  the  least  upper  bound  (l.u.b.)  of  T  if  and  only  if 

(i)   s  >  t  for  all  t  in  T; 

(ii)  s'  >  t  for  all  t  in  T  implies  that  s'  >  s. 

Dually,  s  is  the  greatest  lower  bound  (g.l.b.)  of  T  if  and  only  if 

(i)   s  <  t  for  all  t  in  T 

(ii)  s'  <  t  for  all  t  in  T  implies  that  s'  <  s. 

A  lattice  is  a  partially  ordered  set,  L  =  (S,  <),  which  has  a 
l.u.b.  and  a  g.l.b.  for  every  pair  of  elements. 

We  now  give  an  equivalent  definition  of  a  lattice  in  terms  of 
the  l.u.b.  and  g.l.b.  operations. 
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A  lattice  L  is  a  triplet 

L  =  (S,  •,  +) 
where  S  is  a  non-empty  set  of  lattice  elements  and  "  •  "  and  "  +  "  are 
binary  operations  satisfying  the  four  postulates  (i)  to  (iv)  below, 
known  respectively  as  the  idempotent,  commutative,  associative  and  ab- 
sorption laws. 

(i)    x  •  x  =  x   and   x  +  x  =  x 

(ii)   x  •  y  =  y  •  x   and   x  +  y  =  y  +  x 

(iii)  x  •  (y  •  z)  =  (x  .  y)  •  z   and   x  +  (y  +  z)  =  (x  +  y)  +  z 

(iv)   x  •  (x  +  y)  =  x   and   x  +  (x  •  y)  =  x. 

Let  L  =  (S,  ' ,    +)    satisfy  the  conditions  of  the  above  definition 
of  a  lattice  and  define 

x  <  y  if  and  only  if  x  •  y  =  x. 

Then  it  can  be  verified  that  (S,  <)  is  a  lattice  and 

g.l.b.  (x,  y)  -  x  •  y  and 

l.u.b.  (x,  y)  =  x  +  y. 

If  L  is  a  finite  lattice,  then  L  itself  has  a  l.u.b.  and  g.l.b., 
denoted  by  I  and  0,  respectively.   Element  I  is  called  the  identity  because 

I  •  x  =  x  for  all  x  in  L. 

Element  0  is  called  the  zero  because 

x  +  0  =  x  for  all  x  in  L. 

Let  L  =  (S,  •,  +)  be  a  lattice  and  T  a  nonvoid  subset  of  S. 
Then  L'  =  (T,  •,  +)  is  a  sub  lattice  of  L  if  and  only  if  x  and  y  in  T 
implies  that  x  •  y  and  x  +  y  are  in  T. 
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3.3.   Partitions  of  an  n-cube  and  Decision  Trees 

The  n- dimensional  cube,  or  n-cube  for  short,  has  2"  vertices 
with  n  lines  emanating  from  each  vertex.   The  vertices  of  the  n-cube  are 
labeled  with  n-tuples  of  zeros  and  ones  such  that  two  vertices  are  con- 
nected by  a  line  if  and  only  if  these  labels  differ  in  exactly  one  position. 
As  examples,  the  cubes  n  =  1,  2  and  3  are  shown  in  Figure  3.1. 


o 

0 


-o 


111 
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010 


00 
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101 


100 
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1-cube 


2 -cube 


3 -cube 


FIGUBE  3.1. 


Let  us  agree  to  call  a  single  vertex  a  0-cube.   Then  a  pair  of 
adjacent  0-cubes  will  determine  an  edge  or  1-cube.   If  the  two  vertices 
are,  say,  (1,0,1)  and  (0,0,1),  then  we  shall  denote  this  1-cube  as  (-,0,1). 
As  "  -  "  ranges  over  {0,1},  this  represents  two  vertices  of  the  1-cube. 
In  a  similar  way,  a  2-cube  is  made  up  of  four  0-cubes.   For  instance, 
(0,0,1),  (0,1,1),  (1,0,1)  and  (1,1,1)  make  up  the  2-cube  (-,-,1),  where 


each 


ranges  over  {0,1}  independently. 
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Now  we  define  two  different  n-cube  partitions  as  follows 


DEFINITION  3.1. 


J3 


A  partition  s  of  a  set  of  2  vertices  of  an  n-cube  is  called 
an  n-cube  partition.  The  number  of  blocks  of  a  partition  it  is  denoted 
by  #(*) . 

If  each  block,  B.,  of  this  partition  n  is  a  single  k-cube 
partition  (k  <  n),  then  it  is  called  an  n-cube  rule  partition. 


EXAMPLE  3.1. 


We  show  a  3-cube  rule  partition  n,  below  as  an  example, 


*1   =  {B^  Bg,  B3,  B^}, where  B]_  =  {(0,0,0)},  Bg  =  {(1,0,0)},  B3  =  {(-,0,l)}, 
and  B.  =  {(-,1,-)}.  We  note  that  each  B.  (i=  1,2,3**0  of  rt  is  a  single 
cube. 

The  following  partition  jt  is  a  3-cube  partition  but  not  a  rule 
partition,  however,  because  B  is  not  a  single  cube,  nor  is  B  .  it     =   {B  , 
Bg,  B3,  B^},  where  B±  =    {(0,0,0),  (l,0,l)},  Bg  =  {(0,0,1)},  B3  =  {(0,1,0), 
(0,1,-),  (1,1,1)}  and  B^  =  {(l,-,0)},  respectively. 
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Both  rule  partition  and  (general)  partition  of  the  n-cube  can 
be  considered  as  simple  models  of  a  decision  table  with  n  condition  rows. 
If  we  are  concerned  only  with  the  condition  stub  and  condition  entry  por- 
tions of  a  decision  table  (which  has  no  Else-rule  nor  redundancy  or  ambi- 
guity), a  rule  partition  corresponds  uniquely  to  a  decision  table  with 
n  conditions.  An  n-cube  partition,  more  generally,  can  be  considered  as 
another  model  of  a  decision  table.  A  block  of  a  partition  corresponds  to 
an  action  of  a  decision  table. 

As  long  as  conversion  techniques  are  concerned,  these  two  parti- 
tions serve  as  simple  theoretical  models  of  a  decision  table  for  the 
discussion  of  its  optimality. 

Next  we  define  binary  trees,  and  then  binary  decision  trees. 


DEFINITION  3.2. 

A  binary  tree  T.  with  i  internal  nodes  and  i+1  terminal  nodes 
(i  >  0)  is  defined  recursively  as  follows: 

If  i  =  0,  then  it  consists  of  a  single  terminal  node,  otherwise 
it  is  a  triple  (T.,  v,  T  ),  where  v  is  a  distinguished  internal  node 
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called  the  root  of  T.,  and  Tg   and  T  are  binary  trees  vrith  I   and  r  internal 

~ — — — ■     x       Z  r 

nodes,  respectively,  and  with  £+1   and  r+1  terminal  nodes,  respectively, 
where  I   >  0,  r  >  0,  i+r  =  i-1. 

DEFINITION  3-3» 

A  binary  decision  tree  involving  n  conditions  C, ,  CL, ....  C  is 

il °  i   2       n 

a  binary  tree  each  of  whose  internal  nodes  is  labeled  with  a  con- 
dition C .  such  that,  for  any  path  from  the  root  of  the  tree  to  a  terminal 
J 

node,  no  condition  C.  appears  more  than  once  along  this  path. 

Then  we  associate  a  binary  decision  tree  with  an  n-cube  parti- 
tion in  the  following  manner. 

DEFINITION  3.U. 


The  n-cube  rule  partition  associated  with  a  binary  decision 

tree  T  is  defined  as  follows: 

For  any  terminal  node  t  of  T  and  the  path  C  ,  C  ,...,  C 

tl   t2       S 

from  the  root  to  t,  where  t.  e  {1,2, ...n},  associate  the  k-cube  (k  =  n-p), 

-       /  t     t  tx 

in  such  a  way  that 

1)  x.   is  "-"  if  C.  is  not  in  the  path  C  ,  C  ,...,  C   and, 

i  l  t±       tg       tp 

2)  x.   is  0  (l)  if  C.  is  in  the  path  C  ,  C  ,...,  C   and  the 

i  i  tl   t2       p 

terminal  node  t  exists  in  the  left  (right)  sub-decision  tree  of  the 

internal  node  C. . 
l 


2k 


It  is  obvious  that  all  these  blocks  B  form  an  n-cube  rule 


partition  and  the  partition  is  unique 


EXAMPLE  3-2. 


Assume  n  =  3«   Associated  with  each  terminal  node  i,  is  a  block  B. 
(i  =  1,2,3,10  as  follows: 

B]_  =  {(0,-,0)} 

B2  =  {(0,0,1)} 

B^  =  {(0,1,1)} 

They  form  the   following  3-cube  partition  jr. 


vt- 


B, 
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DEFINITION  3.5. 

It  is  said  that  a  decision  tree  realizes  a  rule  partition  it  if 
and  only  if  each  terminal  node  of  the  tree  represents  a  block  of  the 
partition  jt  and  vice  versa.   A  partition  is  realizable  if  and  only  if 
there  exists  a  decision  tree  which  realizes  the  partition.  We  note  that 
a  realizable  partition  is  always  a  rule  partition. 


EXAMPLE  3.3. 


We  show  an  example  of  a  non-realizable  3-cube  partition  it, 


Br 


B, 


Blocks,  B.,  of  Jt  are  B  =  {(1,0,-)},  B 
B^  =  {(0,0,0)},  and  B  =  ((1,1,1)}. 


B, 


{(0,-,l)},  B^  =  {(-,1,0)}, 


REMARK 

As  we  have  shown,  a  decision  tree  determines  a  unique  n-cube 
rule  partition.   However,  the  converse  is  not  true.   In  other  words, 
more  than  one  decision  tree  may  realize  the  same  partition  as  we  show 
in  the  following. 
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\  B3B6   B5      Bh  Bg 

Two  different  decision  trees  T  and  T  realize  the  same 


B. 


partition  jr. 


5.^.   Lattices  of  n-cube  Partitions 

So  far  we  have  learned  that  there  exist  three  different  kinds 

of  n-cube  partitions,  i.e.,  (general)  partitions,  rule  partitions  and 

realizable  partitions.   Since  a  realizable  partition  is  necessarily 

a  rule  partition,  we  have 

S  c  s  "C  S 
o    r 

where  S  ,  S  and  S  are  the  sets  of  all  realizable,  rule  and  general 
o7      r  ° 

n-cube  partitions,  respectively. 

In  this  section  we  study  those  sets  S,  S  and  S  of  all  n-cube 

r      o 

general,  rule  and  realizable  partitions.   In  more  details,  we  introduce 
a  binary  relation  "  <  "   between  two  n-cube  partitions  and  show  that 
these  sets  with  this  relation  are  partially  ordered  sets.   Then  three 
binary  operations  "  •  ",  "  +  ",  and  "  ©  "  are  introduced  between  n-cube 
partitions. 
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We  discuss  closure  properties  of  S,  S  and  S  under  these  operations 
and  show  that  (S,  ♦,  +)  and  (S  ,  ',  ©)  are  lattices.  More  advanced 
study  of  optimality  problems  will  be  presented  in  Chapter  k,   based  on 
these  lattices. 

The  relation  "  <  "  and  a  multiplication  "  •  ",  both  defined 
between  two  cube  partitions,  are  very  important  concepts  and  are  used  more 
than  often  in  the  succeeding  two  chapters.  We  introduce  the  binary  rela- 
tion "  <  "  first. 


DEFINITION  3-6. 


and  write 


For  n     and  n  ,  we  say  that  rt  is  larger  than  or  equal  to  re 


«!<*2 


if  and  only  if,  for  any  two  vertices  v  and  v_  of  the  n-cube, 


Vl  ~  V2  (*l)   implies  v  ■  vg  (k2) , 


that  is,  it  <  it  holds  if  and  only  if  every  block  n     is  contained  in  a 
block  of  if  . 

Since  this  relation  satisfies  the  three  properties  of  a  partial 
ordering,  we  can  present  the  following  proposition. 

PROPOSITION  3.1. 

The  binary  relation  "  <  "  in  the  above  is  a  partial  ordering  of 

S,  S  ,  and  Srt. 

'   r'      0 

Next  we  introduce  the  binary  operations  "  •  ",  "  +  ",  and  "  @  " 
among  n-cube  partitions. 
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DEFINITION  3-7- 

If  it  and  itp  are  partitions,  then 
(i)   it  •  it p  is  a  partition  such  that 

vl  S  V2  ^1  '  ^  if  and  0nly  lf  Vl  ^  V2  ^1^  and 

Vl  S  V2  •  ^2^' 

(ii)   it  +  it  is  a  partition  such  that 

v_  -  vp  (it  +  it  )  if  and  only  if  there  exists  a  sequence  in  v 
such  that 

u.  =  u.    ,  (n,)  or  u  =  u.   ,  (it_)  for  0  <  i  <  1-1. 

i    i  +  1  v  1      i    i  +  1  '  2y       __ 

The  procedure  to  form  it  •  it  is  very  simple  since  the  blocks 
of  it  •  it  are  obtained  by  intersecting  blocks  of  it  and  it  , 

The  process  to  obtain  it  +  it  is  longer,  but  still  straight- 
forward.  To  compute  B     '   (v)  we  proceed  inductively.   Let 

B  (v)  =  B   (v)  U  B   (v) 
1        2 

and  for  i  >  1  let 

Bi  +   (v)  =  B  (v)  U  {B  |  B  is  a  block  of  it  or  it  , 

and  B  R  B.  (v)  ^  0   }. 

Then  B  (v)    =  B   (v)    for  any  i   such  that  B.        ,    (v)    =  B.(v) 

it     +  it      x  *  i  +  1    v  l 

EXAMPLE  3.4. 

We  show  it   •  it  and  it  +  it  by  the  following  examples. 
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*1  •  *2 


*!+  «2 


NOTATION  3.1. 


and 


Repeated  multiplication  and  addition  are  represented  by 

n 

jt  •  it  . . ,  jt  =  n  it. 
1    2    n       l 

i=l 


n 


JT   +  JT   +  .  .  .  +  TT   =  Z  IT. 

1    2        n   .  ,  i 
1=1 

According  to  the  first  definition  of  a  lattice  in  Section  3.2. , 
it  is  easy  to  show  the  following. 

PROPOSITION  3.2. 

The  partially  ordered  set,  (S,  <),  of  all  n-cube  partition  is  a 
lattice  and 


, 
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g.l.b.  (xlt   «2)  =  k±   ■  *2 

l.u.b.  (jt^  stg)  =  ^  +  ng. 

Since  we  can  define  the  binary  relation  "  <  "  by:  «»■  <  jt  if  g^  only 
if  jt  •  jt  =  jt  if  and  only  if  it +  jt  =  jt  ,  we  can  prove  the  following 
statement  by  using  the  second  definition  of  a  lattice. 

PROPOSITION  3-3- 

The  set  (S,  •,  +)  of  all  n-cube  partitions  with  the  two  binary 

operations  "  •  "  and  "  +  "  is  a  lattice. 

We  know  that  the  two  other  sets  S  and  S  of  all  rule  and  realiz- 

r      o 

able  partitions  are  subsets  of  S  and  also  partially  ordered  sets  with 

respect  to  the  relation  "  <      As  we  now  show  however,  S  and  S  with 
.f  _  '  r     o 

the  two  operations  •"  •  "  and  "  +  "  are  not  sublattices  of  (S,  ' }    +) . 

Next  we  investigate  these  two  sets  S  and  S  more  carefully. 

r      o  J 

PROPOSITION  3A. 

1)  With  respect  to  rule  partitions: 

If  jt_ ,  jt  e  S  then  Jt  •  jr.  e  S  .  However,  there  exists  a  pair 
1   2    r       1    2    r 

of  rule  partitions  jt  and  jt_  (  jt_  ,  jt  6  S  )  such  that  jt_  +  tl_  &   S  . 

1  2    v    1'      2  r  12rr 

2)  With  respect  to  realizable  partitions: 

If  jt   .    jt     €  S   .    then  jt      •    jt_    6  S   .      However,    there  exists  a 

1   2    o        1    2    o  7 

pair  of  realizable  partitions  Jt,  and  Jt_  (itn,  it  e  S  )  such  that  jt_  +  jt_ 

1      2   1   2    o  12 

'      o 

3)  There  exists  a  pair  consisting  of  a  rule  partition,  it. , 

and  a  realizable  partition,  jt  ,  such  that  jt  <  jt  and  jt  is  not  realizable. 
PROOF:  The  proof  of  the  first  two  statements  of  l)  and  2)  are  omitted. 
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It  is  sufficient  to  show  the  following  two  partitions  it  and  it  in  order 
to  prove  the  two  second  statemtnts  of  l)  and  2) . 


*1+  *2 


Both 


it  and  it  are  realizable  (i.e.,  rule)  partitions.   However, 


it  +  it  is  neither  a  rule  partition  nor  a  realizable  partition. 

To  prove  3),   we  show  the  following  two  partitions  it  and  it  . 


*1  '  *2 


it  is  a  rule  partition  but  not  a  realizable  partition.   it  is 
a  realizable  partition  consisting  of  only  one  block  (the  3-cube  itself) 
and  it  <  it  .  However,  it   •  it  =  it  ,  and  so  it   •  it  is  not  realizable. 
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By  the  above  proposition,  we  learned  that  sets  S  and  S  are 
not  closed  under  the  operation  "  +  "  while  they  are  closed  under  the 
operation  "  •  ".   Therefore,  (S  ,    • ,  +)  and  (S  ,    • ,  +)  are  not  sub- 
lattices  of  the  lattice  (S,  ' ,   +) . 

Instead  of  the  operation  "  +  ",  we  next  define  an  operation 


between  two  partitions  as  follows. 


DEFINITION  3-8. 


it  ©  it p  is  a  rule  partition  which  satisfies 


l)  k     ©  it  >  jt  +  it  and 


2)   for  any  it  £  S  such  that  it  +  it  <  it,  it 


ii?  <  it  holds. 


EXAMPLE  3-3» 

We  show  two  examples  of  the  operation  " 


1 


*-]_  ©  *2 


Pl  P2  pl  ®  p2 

The  process  to  obtain  it  ©  it  from  it  and  it  is  omitted. 

Since  it  is  easily  shwon  that  S  and  S  are  closed  under  the 

r      o 

operation  "  ©  "  and  we  can  define  g.l.b.  (jr.,  it  )  -  it  ©  it   then  we 
obtain  the  following  proposition. 
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PROPOSITION  3.5. 

The  sets  (S  ,  •,  ©)  and  (S  ,  *,  ©)  are  lattices,  respectively. 
And  the  latter  is  a  sub-lattice  of  the  former. 


REMARK 

The  set  (S,  »,  ©)  is  not  a  lattice  since  jt  ©  it  does  not  satisfy 
g.l.b.  (it  ,  it  )  property. 

It  is  known  that  every  finite  lattice  has  0  and  I  elements. 
The  three  lattices  (S,  ' ,   +) ,    (S  ,    ' ,   @)  and  (S  ,  • ,   ©)  have  the  following 
same  0  and  I  elements. 

1)  The  0  element  is  a  partition  consisting  of  2  blocks,  and 
each  block  contains  one  and  only  one  vertex. 

2)  The  I  element  is  a  partition  consisting  of  one  block  con- 
taining all  2  vertices,  i.e.,  the  n-cube  itself. 
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k.      SOME  ASPECTS  OF  DECISION  TREE  CONSTRUCTION 

k.l.      Introduction 

Known  manual  methods  (Egler  [2],  Press  [25],  and  Pollack  [23]) 
of  converting  decision  tables  into  decision  trees  are  based,  mainly  on 
plausible  arguments  but  little  theoretical  background.   On  the  other 
hand,  Reinwald  and  Soland  ([26],    [27])  formulated  this  conversion 
problem  as  a  problem  in  mathematical  programming,  and  described  a 
b ranch- and-bound  algorithm  for  the  construction  of  decision  trees  which 
minimizes  either  the  average  processing  time  or  the  storage  requirement. 

In  this  chapter  we  derive  some  basic  theoretical  results  con- 
cerning optimal  decision  trees.   The  argument  is  developed  based  on 
n-cube  partitions  which  are  introduced  as  a  simplified  mathematical 
model  of  decision  tables.   After  the  cost  of  a  decision  tree  is  defined, 
a  procedure,  called  Procedure  R,  to  construct  decision  trees  from  a 
given  partition  is  shown.   Based  on  this  procedure  an  algorithm,  called 
"iterated  laocal  minimization",  is  proposed.   It  does  not  always  generate 
an  optimal  decision  tree,  but  yields  suboptimal  trees  which  approxi- 
mately minimize  costs.   The  trees  generated  by  this  algorithm  are  com- 
pared quantitatively  with  optimal  trees  and  with  those  constructed  by 
Pollack's  first  algorithm. 

This  chapter  also  contains  an  analysis  of  "rule-splits", 
particularly,  lower  and  upper  bounds  for  the  minimum  number  of  required 
rule  splits  over  all  n-cube  rule  partitions.   Since  a  decision  tree  which 
is  optimal  for  one  objective  function  is  not  necessarily  optimal  for 
other  objective  functions,  relationships  existing  among  optimal  decision 
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trees  under  different  objective  functions  are  also  discussed  in  this 
chapter.  Such  arguments  are  based  on  partially  ordered  sets,  S  and 
S  ,  of  n-cube  realizable  and  rule  partitions. 

k.2.      How  to  Construct  Decision  Trees 

In  this  section,  we  describe  a  procedure,  called  Procedure  R, 
to  construct  decision  trees  which  realize  partitions  tt' ,  it"  f    ...  which 
are  refinements  of  a  given  partition  n   (i.e.,  n'  <  nf    tt"<it,  ...). 
Algorithms  for  constructing  decision  trees  which  have  been  proposed  in 
the  literature  ([23],  [25],  [26],  [27])  as  well  as  the  algorithm  which 
is  presented  in  this  section,  are  essentially  based  on  this  procedure. 
To  discuss  the  optimality  of  constructed  decision  trees  we  first  intro- 
duce the  cost  of  a  decision  tree  as  an  objective  function  of  our  con- 
version problem. 

DEFINITION  k.l. 


The  cost  of  a  decision  tree  is  the  number  of  internal  nodes 
in  it,  i.e.,  the  number  of  terminal  nodes  minus  one. 

We  note  that,  if  each  condition  C.  equally  requires  a  unit  of 
storage  space,  this  cost  coincides  with  the  total  storage  requirement 
previously  defined.   It  is  obviously  a  special  case  of  average  pro- 
cessing time  cost.   By  simplifying  these  two  conventional  costs  of 
flowcharts  into  the  above  simple  cost,  it  is  hoped,  that  we  can  develop 
more  theoretical  arguments  about  optimal  decision  trees. 

As  defined  earlier,  a  decision  tree  is  said  to  realize  an 
n-cube  partition  it   if  and  only  if  each  terminal  node  of  the  tree 
corresponds  to  a  block  of  a  k-cube  (k  <  n)  of  the  partition  it  in  a 


36 


one-to-one  manner.   We  possibly  have  more  than  one  decision  tree 
realizing  the  same  partition.   The  cost  of  all  decision  trees  realizing 
the  same  partition,  however,  is  always  the  same,  since  it  is  simply 
the  number  #(jt)  of  blocks  of  the  partition  jt  minus  one. 

In  short,  if  a  given  partition  it   is  realizable,  all  corre- 
sponding decision  trees  have  the  same  cost  of  #(jt)-l.   We  will  show 
later  in  this  section  that  it  is  easy  to  check  whether  a  given  partition 
jt  is  realizable  or  not,  and  to  construct  decision  trees  realizing 
realizable  partitions. 

On  the  other  hand,  if  a  partition  Jt  is  not  realizable,  then  it 
does  not  have  any  corresponding  decision  tree,  then  we  must  split  some 
blocks,  B.,  of  jt  into  smaller  blocks  so  that  the  resultant  partition 
jt'  is  realizable.   As  an  example,  we  show  in  Figure  h.l. ,    a  realizable 
partition,  jt',  which  was  obtained  by  splitting  the  block  B.  of  an 
nonrealizable  partition,  jt,  into  smaller  blocks,  B  '  and  B, ". 


y>i 


FIGURE  k.l. 
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Then  the  cost  of  a  decision  tree  realizing  re'  is  #(rt')-l,  and 

#(*')-!  =  #(«)-l  +  {#(*<)  -  #(n)}. 
The  quantity  #(«' )  -  #(ir)  is  called  the  loss(rc',rc)  due  to  the  replacement 
of  the  (nonrealizable)  partition  n   by  the  (realizable)  partition  rt* ,   where 
rt'  <  it, 

DEFINITION  k.2. 


The  minimum  cost  for  a  partition  re  is  defined  to  be 

Min  {#(*'  )-l}  =  #(it)-l  +  Min  loss(re'  ,re) 
it'  it' 

where  Min  is  taken  over  all  realizable  partitions,  re',  which  satisfy 
re* 

it'  <  re. 

Now  we  see  that  our  optimization  problem  is  to  find  a 
realizable  partition,  re'(<  jt)>  for  a  given  partition,  re,  such  that  the 
loss(ir',jt)  ■  #(jt')  -  #(«)  is  minimized. 

Next  we  describe  a  procedure  to  construct  a  decision  tree 
realizing  re'  for  a  given  partition  tx   (re'  <  re)  and  show  how  to  calculate 
the  loss, #(rt')  -  #(rc).   The  entire  procedure,  called  Procedure  R,  is 
based  on  the  following  Operation  A  which  generates  two  (k-l)-cube 
partitions  from  a  k-cube  partition. 

OPERATION  A 

Given  a  k-cube  rule  partition,  ex,  which  consists  of  more 

than  one  block,  and  given  a  condition,  C  ,  where  1  <  s  <  k,  define 

two  (k-l)-cube  rule  partitions,  cr  and  cr  ,  as  follows: 
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For  each  block  B.  =  (ac.,  x  ,  ...,  xg,  ...,   x^)  of  cr: 

1)  if  x  of  B.  is  0,  then  the  (k-l) -tuple  (x  ,  x  ,  . .., 

S         -L  JL     £_ 

xs-l>  Xs+1'  ""'  \)    is  a  bl°Ck  °f  V  and 

2)  is  x  of  B.  is  1,  then  the  (k-l) -tuple  (x  ,  x  ,    . .., 

x   ,  x   ,  ...,  x.)  is  a  block  of  cr  ,  and 

S— J.    S+J-  K  -L 

£  /  i    i         i 

3)  if  x  is  -,  then  the  (k-l) -tuple  (x  ,  x  ,    . ..,  x 

x     ,,...,  x  )  is  a  block  of  both  cr„  and  cr  ,  and 
s+1'    -      k'  0      X 

k)     crn  and  cr  have  no  blocks  other  than  those  obtained  from 
l),  2)  and  3)  above. 


EXAMPLE  k.l. 

Consider  the  following  3-cube  partition  cr,  where  cr  = 

Cl  C2  C3 
{  B1  =  (0,   0,   0),  B2  =  (0,1,0),  B3  =  (0,-,l),  \  =    (1,-,-)}.   If 

we  choose  C  as  the  root  of  the  decision  tree,  then  the  corresponding 

c2  c3 

cr0   and  a±  are  crQ   =   {(0  ,   0   ),    (1,0),    (-,1))    and  o^  =   {(-,-)}, 
respectively. 
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Given  a  partition  jt,  Procedure  R  constructs  a  decision  tree 

which  realizes  a  partition  it'(<  tt),  by  applying  Operation  A  repeatedly. 

At  the  moment  we  will  leave  open  how  procedure  R  chooses  the  condition 

C  to  be  used  in  Operation  A.   Various  specific  choices  will  be  dis- 
s 

cussed  later. 

PROCEDURE  R 

Assume  a  rule  partition  it  is  given.   If  n   consists  of  a  single 
block,  construct  a  decision  tree  which  consists  of  a  single  terminal 

node.   Otherwise,  choose  a  condition  C  ,  derive  the  two  partitions, 

s 

jt  and  it  ,  by  applying  Operation  A  to  it  and  C  ,  and  construct  a 

\J  _L  S 

decision  tree  as  follows: 

Its  root  is  labeled  with  condition  C  ,  its  left  subtree  is 
obtained  by  applying  Procedure  R  to  jt  ,  and  its  right  subtree  by 
applying  Procedure  R  to  k  . 

Then  we  obtain  the  following  proposition.  (The  proof  is  omitted.) 
PROPOSITION  k.l. 

Procedure  R,  applied  to  a  partition  jt,  constructs  a  decision 
tree  realizing  a  partition  it'  which  satisfies  it'  <  jr. 

EXAMPLE  k.2. 


Readers  are  suggested  to  check  the  above  Procedure  R  by  two 
examples  in  Figure  2.3-  (a)  and.  (b)  in  Chapter  2.   For  the  case  (a),  the 
realized  partition  is  the  given  partition,  it.   For  the  case  (b),  however, 
Procedure  R  yields  the  tree  realizing  the  following  partition,  rr',  which 
is  smaller  than  the  original  partition,  it. 


ko 
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In  Proposition  k.l.,    it  is  shown  that  the  tree  constructed  by- 
Procedure  R  applied  to  a  partition  it  realizes  a  partition  it1  which 
satisfies  it'  <  it.   Now  we  show  how  the  loss(jr',«)  =  #(rt' )  -  #(jt)  can 
be  calculated.   For  this  purpose  we  introduce  the  following  definition. 


DEFINITION  k.3. 

The  loss,  Z(C   ,u)   due  to  using  C  when  Procedure  R  is  applied 
— — —     s  s 

to  a  partition  cr  is  defined  by 

£(C   ,cr)  =  the  number  of  blocks,  B.  =  (x  ,  x  ,    ...,   X,  ),  of  cr, 

where  x  is  "-". 
s 

We  call  C  a  lossfree  condition  with  respect  to  cr  if  and  only  if 
i(C  ,cr)  =  0  holds. 
On  the  other  hand,  if  all  blocks,  B.  =  (x  ,  x  ,  ...,   x.) ,   of  cr 

have  x  ="*",  then  the  condition  C  is  called  inessential  to  cr. 

s  '  s  


PROPOSITION  k.2. 

The  cost  of  the  decision  tree  realizing  rt'    constructed  by  the 
above   Procedure  R  is   given  by 

#(se')-l  =  #(*)-l  +  (#(*')   -  #(*)} 

=  #(«)-l  +  S   |(C     fa±)t 
i 


hi 


and,  therefore,  the  loss  (it',jt)  is  expressed  by 

Loss(ir*,jt)  =  2  &{C      ,<J.) 
i    si  x 

where  the  sum  Z  £(CS  ,  cr. )  (it  is  called,  simply  the  total  loss)  is  taken 
i     i  -1  

over  all  internal  nodes  i  of  the  decision  tree  constructed  by  Procedure  R. 

The  proof  is  not  given.   However,  consider  the  example  in  Figure 

2.3-  (a)  and.  (b)  in  Chapter  2,  again.   In  (a),  the  conditions  chosen  at 

each  step  are  all  lossfree  conditions  with  respect  to  the  corresponding 

partitions.   Therefore,  the  total  loss  is  zero.   On  the  other  hand,  in 

(b),  the  root  C  generates  the  loss  one.   The  rest  of  the  internal  nodes 

generate  no  losses.   That  is,  the  total  loss£  i(C  ,cr.)    is  equal  to  one. 

1    Si  X 
Actually,  it  is  .shown  in  Example  l+.2.that  the  loss(Tt',Jt)  is  equal  to  one 

for  (b). 

Now  the  problem  of  constructing  an  optimal  decision  tree  from  a 

given  partition  is  reduced  to  finding  a  suitable  condition,  C  ,  at  each 

step  of  Procedure  R  so  that  the  total  loss,Si(C   ,o\),  is  minimized. 
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So  far,  there  is  no  method  proposed  which  guarantees  minimizing  the  total 
loss.   One  plausible  method  is  Pollack's  first  algorithm,  and  the  other 
is  the  iterated  local  minimization  which  is  explained  in  the  following. 
Before  we  introduce  this  algorithm,  one  very  basic  theorem  is 
presented. 

THEOREM  k.l. 


Assume  there  exists  a  lossfree  condition  C.  for  a  given 
partition  jr.  Then  any  decision  tree  with  its  root  C,  (k/i)  (realizing  *' 
(  <  k   ))  can  be  transformed  into  a  decision  tree  with  the  root  C.  ,  while 
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preserving  or  reducing  the  cost  of  the  original  tree. 

PROOF:   Consider  a  decision  tree  T  with  the  root  labeled  C   (k^i), 

which  realizes  a  partition  jt'  (<  it).   Since  i(C,  jt)  =  0,  it  is  easy  to 

see  that  C.  appears  in  every  path  from  the  root  to  a  terminal  node  of 

T.   Now  we  mention  the  lowest  level  at  which  the  internal  node  C. 

1 

appears.   We  note  that  no  other  C.  must  appear  in  the  path  from  the  root 
to  this  present  C.  from  the  property  of  a  decision  tree  (in  Chapter  3)« 
Then,  we  claim  that  its  neighboring  node  at  the  same  level  which  has  the 
same  predecessor,  C,  must  be  also  C.  because, 

1)  if  it  is  a  terminal  node,  this  contradicts  the  fact  that 
^(C.,jt)  =  0  since  no  C.  appears  in  the  path  from  the  root 
to  this  terminal  node,  and 

2)  if  it  is  C   (m/i),  then  C.  must  appear  in  a  lower  level  of 

m   '         i 

the  tree  since  no  C.  appears  in  the  path  from  the  root  to 

this  C  .   Therefore,  we  have  a  pair  of  C.'s  at  the  same 
m  '  l 

level  which  have  the  same  predecessor,  C,  as  shown  below. 

J 
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Next  we  replace  the  C.  by  C.  and  the  two  C.'s  by  two  C.'s.   At 

the  same  time,  two  trees,  T  and  T  ,  of  the  four  subtrees  of  T  (T. 

23  i 

through  T.  )  shown  above  are  exchanged.   Then  it  is  easy  to  see  that  the 


following  modified  decision  tree  also  realizes  the  same  partition,  n* 
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Let  o"  and  cr  denote  those  two  partitions  which  are  realized 

by  the  left  and  right  subtrees  following  the  new  C.  of  the  modified 

tree,  respectively.   If  it  happens  that  the  condition  C.  is  inessential 

to  0"  (cr  ),  then  the  p-cube  partition,  a      (cr  ),  degenerates  to  the 

(p-l)-cube  partition,  cr'   (cr'  ),  by  removing  "-"  in  the  j-th  position 

from  the  p-tuple  of  a  block  B.  of  cr  (a   ).   (We  note  that  if  C.  is  a 

1     0   1  l 

lossfree  condition  with  respect  to  it,  then  C.  is  always  essential  at 

each  step  of  the  Procedure  R.   Therefore  such  degeneration  does  not 

occur  by  C.)  If  such  a  degeneration  occurs  for  <j-    (<Jn) ,   then  C.  can  be 
l  0   1'       j 

eliminated  from  its  corresponding  subtree.   Then,  the  tree  with  C. 

J 

eliminated  realizes  a  different  partition,  n",  but  still  jt"<  it  holds, 

and  it  is  easy  to  show  that  jt'  <  rt".   In  other  words,  we  could  reduce 

the  cost  by  one  (or  two,  if  both  C.'s  were  eliminated).   When  such 

J 

degeneration  does  not  occur,  then  the  cost  of  the  modified  tree  remains 
the  same  as  that  of  the  original  tree.   By  applying  this  argument 
successively,  we  can  move  the  lossfree  condition  C.  from  lower  levels  to 


higher  levels  while  preserving  or  even  reducing  the  cost  of  the  tree. 

Finally,  we  obtain  the  decision  tree  with  the  root  C.  and  with  its  cost 

less  than  or  equal  to  that  of  the  original  tree  with  the  root  C  .    Q.E.D. 

The  above  theorem  describes  the  case  of  the  root  of  an  entire 

decision  tree.   In  similar  way,  however,  the  argument  generally  can  be 

applied  to  construction  of  subtrees,  i.e.,  choosing  a  condition,  C  , 

s 

for  o"  at  each  step  of  Procedure  R. 


REMARK 

Using  Pollack's  terminology,  this  theorem  says  that  the 
condition  row  with  dash  count  0  should  be  chosen  first. 

We  can  present  the  following  policy  for  the  selection  of  a  con- 
dition, C  ,  for  a  partition,  °",  at  each  step  of  Procedure  R. 

LOSSFREE-CONDITION-FIRST  (LCF)  Policy 

At  each  stage  of  Procedure  R,  select  a  lossfree  condition,  if 
one  exists. 

If  we  use  the  above  concept,  the  LCF  policy,  we  can,  as  follows, 
easily  state  the  realizability  of  a  partition  as  well  as  show  how  to  con- 
struct decision  trees  realizing  such  a  realizable  partition. 

PROPOSITION  k.3. 

A  partition  it   is  realizable  if  and  only  if  we  can  choose  a 
lossfree  condition  at  each  step  of  Procedure  R.   Moreover,  assume 
Procedure  R  with  the  LCF  policy  is  applied  to  a  realizable  partition  n. 
Then,  all  decision  trees  constructed  by  this  procedure  with  the  LCF 
policy  realize  the  same  partition,  jt,  and  the  cost  of  such  trees  is  the 
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same.   That  is,  choosing  one  particular  condition  out  of  all  lossfree 
conditions  with  respect  to  cr.  at  each  step  of  Procedure  R  does  not 
influence  the  realizability  of  the  partition,  n,    nor  change  the  cost  of 
the  decision  trees. 

The  proof  is  omitted. 

Now  we  propose  the  iterated  local  minimization.   As  shown  in 

the  above  Proposition  U.3.,  there  is  no  difficulty  in  constructing  decision 

trees  if  a  given  partition  is  realizable.   According  to  Proposition  ^4.2. , 

the  loss(jt',jt)  can  be  expressed  by  Z£(C  ,a-),   "when  Procedure  R  is 

i    Si  X 

applied  to  a  nonrealizable  partition,  Tt.   One  plausible  way  to  minimize 

this  loss  is  to  choose  a  condition,  C  ,  at  each  step  such  that  the 

s 

loss  H(C      ,  cr. )  is  minimized. 
v  s.   i7 

1 

ALGORITHM  (The  iterated  local  minimization) 

At  each  step  of  Procedure  R,  choose  C  such  that  i(C  ,cr)  is  a 
minimum  over  all  i(C.,cr)  for  all  possible  choices  C..  In  other  words, 
this  algorithm  consists  of  the  following  two  cases. 

Case  l)   If  there  exists  a  lossfree  condition,  C  ,  with  respect 

to  cr,  then  choose  C  as  the  root  of  the  current  decision 
7  s 

tree.   If  several  such  lossfree  conditions,  C  ,  exist, 

'   s       ' 

then  choose  any  of  them  (the  LCF  policy). 
Case  2)   If  there  is  no  lossfree  condition,  then  choose  any  C 

whose  i(C  ,cr)  is  a  minimum  among  all  possible  conditions. 

REMARK 

In  the  terminology  of  decision  tables,  these  cases  l)  and  2) 
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can  be  restated  as  follows.   For  a  given  decision  table,  first  select 

a  condition  row  having  no  dash  entry,  if  possible.   If  such  a  row  does 

not  exist,  select  a  condition  row  having  the  fewest  number  of  dash 

entries. 

Readers  should  notice  the  difference  between  Pollack's  first 

algorithm  ([23l)  and  the  above  iterated  local  minimization.   Pollack's 

first  algorithm  uses  dash  count  (which  is  a  weighted  sum  of  a  number  of 

dashes  of  a  condition  row)  as  a  criterion  for  selecting  a  condition,  C  , 

s 

at  each  step  of  Procedure  R,  and  chooses  a  condition  whose  dash  count  is 
a  minimum.  (On  the  other  hand,  the  iterated  local  minimization  simply 
calculates  a  number  of  dashes  of  a  condition  row  and  chooses  a  condition, 

the  number  of  dashes  of  which  is  a  minimum  among  all  possible  choices  of 

\  k  • 

conditions.)  The  weight,  2  1,  of  dash  count  is  a  number  of  O-cubes  in  a 

rule  (i.e.,  block)  of  kj_-subcube.   Then,  the  dash  count  of  a  condition 

k* 
C  is  the  sum  T,   2  x  where  indices  i  are  taken  over  all  k  -subcube  blocks 
s  i  i 

which  are  split  by  the  condition  C  . 

s 

Now  we  return  to  continue  arguments  concerning  the  iterated 
local  minimization.   Unfortunately,  this  algorithm  does  not  always 
guarantee  optimal  decision  trees  due  to  case  2)  of  the  algorithm.   We 
show  this  fact  by  the  following  Example  k.?>. 

EXAMPLE  h.  3 . 

Consider  the  following  partition  n  . 
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Both  the  iterated  local  minimization  and  Pollack's  first  algorithm 
generate  the  same  decision  tree,  T  of  Figure  2.1.  (a).   They  choose 
CY  first  as  the  root  of  the  decision  tree.   The  cost  of  T,  is  12. 


B,"  B" 


FIGURE  ^.2.(a). 


FIGUPE  k.2.  (b) 
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The  decision  tree  T  of  Figure  k.2.    (b)  is  the  optimal  tree  for  this 
nonrealizable  partition,  it.      Its  cost  is  11.   We  show  in  Figure  k.  3- , 
two  partitions,  rt'  and  it"   realized  by  T.  and  T  ,  respectively. 
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FIGUEE  k.3.(a). 


FIGUEE  k.J.Cb)- 


We  note  that  the  loss(ir',jt)  and  the  loss (rf", it)  are  3  and  2,  respectively. 


PROPOSITION  h.k. 

The  iterated  local  minimization,  as  well  as  Pollack's  first 
algorithm,  does  not  always  yield  an  optimal  decision  tree. 

k. 3-   Comparative  Study  of  Algorithms 

We  learned  that  the  iterated  local  minimization  and  Pollack's 
first  algorithm  do  not  always  yield  an  optimal  decision  tree.   In  this 
section,  we  compare  trees  generated  by  the  algorithm  of  iterated  local 
minimization,  by  Pollack's  algorithm,  and  optimal  trees;  and  we  provide 
estimates  of  how  far  off-optimal  trees  are  which  are  generated  by  these 
two  algorithms.   To  make  the  comparative  arguments  more  concise,  we 
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prepare  the  following  definition  and  give  a  very  baste  theorem. 


DEFINITION  k.k. 

For  a  k-cube  partition,  it  ,  its  n-cube  extension  (k  <  n)  is 

defined  as  follows.   For  each  block,  B.  =  (x.  ,  x_,  ....  x.  )  of  it,  . 

'      1  1'     2'  'Is.'  k' 

n-k 
we  form  the  2         blocks  of  it     by  adding   (n-k)   new  coordinates,   C,    n  , 

n  '  k+1' 

ck+l  ck+2      cn 
Ck+2'  '"'   Cn'   to   the  tuple>  i,e-'  (xi>  x2>    '•'>   \>   °   »  °   »  •..,0  ), 

Then  #(*  )  is  equal  to  2n~k  •  #(rt  ). 

EXAMPLE  k.k. 

As  an  example,  the  construction  from  it  to  its  n-cube  extension, 

K. 

it  ,  is  shown  below,  where  k  =  3  and  n  =  k. 


THEOREM  k.2. 


n 


Assume  an  algorithm,  A,  is  applied  to  a  k-cube  partition,  it.  9 


and  it  constructs  a  decision  tree  realizing  it  ' .   Then,  as  we  defined 
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previously,  the  loss(it  ',  it.  )  is  #(«.')  -  #(itfc).  When  the  same 
algorithm,  A,  is  applied  to  the  n-cube  extension,  it  ,  of  n   ,  then  the 


loss  (it  ' ,k  )  is  given  by 
n   n 


loss(nn',  *n)  =#(*n')  -  #(«n) 

^n-k   n    /   ,    \ 
>  2    •  loss(it  ',it  ) 


=  2 


n-k 


k  >  V 
(#(«k«)  -  #(nk)), 


where  «  '  is  a  partition  realized  by  the  resultant  decision  tree, 
n 

PROOF:   The  algorithm  A  may  or  may  not  have  the  LCF  policy.   If  it  has 

the  LCF  policy,  then  it  can  choose  the  newly  augmented  coordinates  C  _ , 

k+1 

n-k 
C  ^,  ....  C  at  the  first  2   -1  steps  of  Procedure  R  applied  to  the  n- 

k+2'    '  n 

cube  extension,  it  ,  because  these  conditions  are  lossfree  at  each  step 

of  the  first  2   -1  steps.   In  practice,  this  means  that  the  n-cube 

n-k 
partition  jt  is  divided  into  2    k-cube  partitions,  jt  .   For  each 

k-cube  partition,  the  algorithm  generates  the  loss(rt  '  ,Tt  )  =  #(jt  '  )  - 

#(rt  );  therefore,  totally,  this  algorithm  generates  the  1oss(jt  ',jt  )  - 

n-k 
2    .  lossfjr.  '.it,  )  for  the  n-cube  extension,  it  .   If  this  algorithm  A 
v  k  '  k7  '  n 

does  not  have  the  LCF  policy,  then  it  generates  much  more  loss  than  the 

above  loss  (it  ',it  )  according  to  Theorem  k.l.  Q.E.D. 

n  '  n 

This  theorem  says  that  the  loss  generated  by  an  algorithm  for  a 

n-k 
k-cube  partition  is  magnified  by  the  factor  2    for  its  n-cube  extension. 


COROLLARY  k.3, 


Assume  that  two  algorithms,  A  and  B,  have  the  LCF  policy.   If 
A  and  B  generate  the  loss  (it  '  ,it  )  and  the  loss(it  ",it  )  for  a  k-cube 
partition,  respectively,  then  for  any  n  (>  k),  there  exists  an  n-cube 
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partition  for  which  A  and  B  generate  2    «  loss(n',ir)  and  2    . 

1oss(tt  ",jt,  ),  respectively.   In  other  words,  the  difference  of  the 

n-k 
losses  generated  by  A  and  B  is  2    •.  {loss(it  '  ,it  )  -  loss(«  ",jc.  )}. 

K.  k  k    K 

PROOF:   If  we  consider  the  n-cube  extension  it  of  the  k-cube  it,  ,  then 
n  k' 

the  above  statement  can  be  directly  derived  from  Theorem  k.2.  Q. E.D. 

The  corollary  says  that,  if  there  is  a  difference,  d,  in  the 
losses  generated  by  two  algorithms  with  LCF  policy,  for  a  k-cube 

partition,  then,  its  n-cube  extension  causes  the  difference  in  the  losses 

n-k 
by  A  and  B  to  be  2    .  d. 

We  now  apply  this  corollary  to  compare  the  losses  of  various 

trees. 


THEOREM  k.k. 

For  any  n  >  k,   there  exists  an  n-cube  partition  for  which  the 

n-k 
cost  of  an  optimal  decision  tree  is  2    less  than  the  cost  of  a  decision 

tree  constructed  by  the  iterated  local  minimization  or  by  Pollack's  first 

algorithm. 

PROOF:   Consider  the  U-cube  partition,  n,  in  Example  k.k.      Since  the 
decision  tree  by  the  iterated  local  minimization  or  by  Pollack's  first 
algorithm  differs  from  the  optimal,  in  terms  of  the  difference  of  loss, 
by  one,  then  for  its  n-cube  extension,  the  difference  of  loss  becomes 
2n_k,  according  to  Corollary  k.3-  Q.E.D. 

Secondly,  we  compare  Pollack's  first  algorithm  with  the 
iterated  local  minimization. 
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THEOREM  k.5. 

For  any  n  >  5,  there  exists  an  n-cube  partition  for  which  the 
cost  of  the  decision  tree  constructed  by  Pollack's  first  algorithm  is 
2  •  2n   times  larger  than  the  cost  of  a  decision  tree  by  the  iterated 
local  minimization. 

PROOF:   Pollack's  first  algorithm  generates  the  loss  for  the  5-cube 
partition  of  Figure  k.k.      (it  chooses  C  first  as  the  root  of  the 

decision  tree. ) 


FIGURE  k.k. 


The  iterated  local  minimization,  however,  generates  the  loss  1.   It 
chooses  C  first  as  the  root  of  the  decision  tree.   After  the  LCF  policy 
can  be  applied  at  each  step  of  the  procedure,  we  obtain,  for  the  n-cube 


extension,  the  difference  of  loss  in  the  statement,  i.e., 


^-5 


(3-D  =2-2 


n-5 


Q.E.D. 
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Finally  it  is  shown  that  the  iterated  local  minimization  is  not 
always  better  than  Pollack's  algorithm. 

THEOREM  h.6. 

For  any  n  >  5>  there  exists  an  n-cube  partition  for  which  the  cost 

of  the  decision  tree  constructed  by  the  iterated  local  minimization  is 

n-5 
2*2    times  larger  than  the  cost  of  the  decision  tree  obtained  by  Pollack's 

first  algorithm. 


PROOF :   The  iterated  local  minimization  generates  a  loss  of  5  for  the  5- cube 
partition  of  Figure  4.  5-   That  is,  first  it  chooses  C  as  the  root  of  the 
decision  tree  while  generating  a  loss  2  at  this  first  step.   After  that  it 
generates  the  loss  1  and  2  for  the  resultant  two  4-cube  partitions,  respec- 
tively.  The  total  loss,  therefore,  is  equal  to  5- 


FIGURE  4.5. 


On  the  other  hand,  Pollack's  algorithm  chooses  C  first,  which  gen- 
erates a  loss  3  at  this  step.  After  that,  it  generates  no  loss.  Therefore, 
the  difference  of  loss  by  both  algorithms  for  this  partition  is  (5-3)  =2. 
Using  Corollary  4.3,  the  2  •  2n~5  is  obtained.  Q.E.D. 
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k.k.      Bounds  of  Minimum  Total  Loss 

Associated  with  the  concept  of  total  loss  for  nonrealizable 
partitions,  one  interesting  question  occurs.   That  is:  how  much  total 
loss  must  be  generated  by  an  optimal  algorithm?  Not at ion ally,  this 
quantity  is  characterized  by 

L(n)  =  Max  Min  loss  (it  '  ,jt) 

Jt     It' 

=  Max  Min   {#(«')    -  #(jt)    \    it'   <  jt  and  it1    is   realizable} 
it        jt' 

where  Max  is  taken  over  all  n-cube  partitions, 
jt 

THEOREM  4.7. 

The  quantity  L(n)  is  bounded  by 

i     *   L(P)  *§  'log  (g). 

PROOF:   a)  Lower  bound  Consider  the  3-cube  partition  in  Example  5-3«  in 
Chapter  3-   The  total  loss  associated  with  an  optimal  decision  tree  for 

this  partition  is  equal  to  one.   According  to  Corollary  k.3.    an  optimal 

n-3 
algorithm  generates  at  least  the  loss  of  2    for  its  n-cube  extension. 

Therefore,  we  obtain  the  lower  bound  (which  is  existential)  2    for  L(n). 

b)   Upper  bound.  For  the  proof  of  this  bound,  we  use  the 

following  lemma. 


LEMMA 


k-1 
For  any  step  of  Procedure  R,    at  most   a  loss   of   [2        /kj    is 


generated   (k  >  3),    i.e., 


/(c  ,t)    <  L2k-VkJ, 


where  [x\   is  the  greatest  integer  that  is  less  than  or  equal  to  x,  and 


55 


a  partition,  t,  for  which  we  construct  subtree  at  this  step,  is  a 
k-cube  partition. 

The  proof  of  this  lemma  is  omitted.   Instead,  for  k  =  3,  ^ 

and  5,  we  show  in  Example  U.5-  a  k-cube  partition  which  achieves 

L2k"1/kJ. 


EXAMPLE  U.5, 


The  following  3->   ^—  and  5-cube  partitions  are  examples  which 


,k-l 


enerate  a  loss  of  [2       /kj  for  k  =  3*  ^,    and  5>  at  one  step. 


k  =  3 
P  3-1    , 

if     J-1 
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Now  we  prove  the  upper  bound  "based  on  the  above  lemma.   Assume 

k-1 
that  we  have  an  n-cube  partition  for  which  a  loss  of  [2       /kj  is 

generated  at  each  step  of  Procedure  R.   Then,  for  this  partition,  the 
total  loss  is  given  by 

[2n"1/nJ  +  2  •  L2n"2/(n-l)J  +  ....  +  2n~3  •  L22/3J 
and  bounded  by 

<  2n_1/n  +  2  •  2n"2/(n-l)  +  22  •  2n"3/(n-2)  +  ...  +  2n~3  •  22/3 

=  2n_1  •  {1/n  +  l/(n-l)  +  l/(n-2)  +  ...  +  1/3} 


pn+1  _         n 
<  2    •  /      X  dt  =  2,   •  (log  n  -  log  2) 


2n 


.  log  (n).  Q.E.D. 

2        2 

7 
In  practice,  for  the  case  n  =  10,  the  lower  bound  is  2  .   This 

means,  in  conventional  terminology,  that  there  exists  a  decision  table 

7 
with  10  condition  rows  for  which  an  optimal  algorithm  splits  2  rules 

9 
totally.   It  is,  however,  bounded  by  2   •  log  5  =  820. 

k.5-      Optimality  Discussions  for  Different  Objective  Functions 

So  far  we  worked  only  with  the  objective  function  #(«).   There 
are,  however,  two  other  criteria  to  be  considered,  the  total  memory 
space  requirement,  M,  and  average  processing  time,  P,  which  are  briefly 
described  in  Chapter  2.   Obviously,  #(jt)  is  a  simplified  and  special 
case  of  M  and  P.   However,  one  question  arises.   What  kind  of  relationship 
does  exist  among  those  optimal  decision  trees  which  minimize  different 
objective  functions?  More  simply,  does  an  optimal  decision  tree  for 
one  objective  function  minimize  the  other  two  objective  functions? 
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In  this  section  we  develop  such  arguments  on  optimality  for 
different  objective  functions,  based  on  two  partially  ordered  sets,  S 
and  S,  of  all  realizable  and  rule  partitions,  respectively.   Moreover, 
we  can  show  how  Procedure  R  and  related  algorithms  work  and/or  should  be 
modified  for  those  two  other  objective  functions.   For  simplicity,  the 
three  different  objective  functions  are  called  #-cost,  M-cost,  and  P- 
cost,  respectively. 

STATEMENT 

For  a  given  partition  *c  €  S  ,  where  do  optimal  solutions  exist  in 
S     for  the  different  #-cost,  M-cost  and  P-cost,  respectively?  How  can  we 
relate  one  optimal  solution  to  the  rest?  How  well  do  procedures  or 
algorithms  which  were  so  far  proposed  for  minimizing  #-cost,  work  for 
M-cost  and/or  P-cost? 

This  section  answers  these  questions  to  some  extent. 

We  first  give  a  brief  review  of  properties  for  these  three 
different  costs.   Recall  that  a  decision  tree  is  defined  only  for  a 
realizable  partition  it  and  we  denote  it  by  T(jt).   T(tt)  is  read  "a  decision 
tree  realizing  a  partition  it". 
a)  #-cost 

#-cost,  C(T(jt)),  of  a  decision  tree  T(jt)  realizing  rt  is  the 
number  of  internal  nodes  of  the  tree,  and  it  is  equal  to  the  number  of 
blocks  of  jt  minus  one,  i.e., 

c(t(k))  =  #(«)  -  1. 

Therefore,  the  #-cost  of  all  decision  trees  realizing  it  is  the  same.   In 
other  words,  this  cost  is  independent  of  decision  trees  realizing  ir  and 


58 


depends  only  on  the  partition,  it.   So  we  define  #-cost,  C(jt),  for  a 
realizable  partition  rr  by 

C(ff)  =  #(*)  -  1. 
(Do  not  confuse  C(tc)  with  #(tt).   The  former  is  #-cost,  and  the  latter  is 
the  number  of  blocks  of  it. ) 

It  is  straightforward  to  extend,  the  above  definition  to  the 

case  of  a  nonrealizable  partition,  it.      Thus,  more  generally,  we  define 

#-cost,  C(jt),  of  a  rule  partition  tt  ( e  g  )  "by 

r 

C(«)  =  #(*)  -  1. 

b)  M-cost 

M-cost,  on  the  other  hand,  can  be  defined  only  for  a  realizable 
partition,  jt(  e  S  ),  and  a  decision  tree,  T(tt).   M-cost,  C  (T(tt)),  of  a 
decision  tree  T(it),  is  defined,  by 

where  s.  is  the  storage  space  required  for  a  condition,  C,  and  the  sum 

Z  s-  is  over  all  internal  nodes  i,  of  T(tt). 

i 

Then  it  is  easy  to  see  that  the  M-cost  of  different  decision 
trees  realizing  the  same  partition,  Jt(e  Sn),may  differ,  i.e., 

CM(T1(7T))  ^CM(T2(Tt))' 
Consider  it,  T  and  T  in  Remark  after  Example  3.3.   T  and  T  realize 
the  same  partition,  it,   but  the  M-cost  of  those  trees  is  C  (T_(ir))  = 
Sl  +  ^S2  +  S^  and  CM^TP^^  =  2s  +  s  +  2s  ,  respectively.   It  is  concluded 
that  M-cost  cannot  be  defined  for  a  partition  itself. 

c)  P-cost 

First  we  assume   a  partition,  jt(e  S   ).     A  probability,    Pr(v),    of 


59 


occurrence  for  every  vertex  v  of  it   is  given  and  fixed.   Then,  a 
probability  of  occurrence  of  a  block,  B,  of  jt  is  simply  calculated  by 

Pr(B)  =  ^PrCv). 
Then,  the  P-cost,  C  (T(rt)),of  a  decision  tree  (T( it)) realizing  n   is  defined 


by 


Cp(T(»r))  =  E  Ep(B±)  .M  ) 


i.   k 

k 

where  B.  is  a  block  of  tt  corresponding  to  a  terminal  node,  i,  of  T; 

t.   is  the  time  required  to  process  a  condition  C.  ;  and  the  sum  Z     t. 

k  k  jL   k 

k 

is  taken  over  all  internal  nodes,  C.  ,  along  the  path  from  the  root  to 

Xk 

the  terminal  node  i  of  the  tree.   Examples  are  seen  in  Section  2.2. 

Now  we  show  a  different  way  to  calculate  the  P-cost  without 

constructing  a  decision  tree. 


THEOREM  k.Q. 


Let  a  partition  jt  have  blocks  B.  =  (x.  ,  x.  ,  ...,  x.  ), 

l     i_ '      l  '    '   l  " 

12        n 

where  x.   =  1,  0  or"-".  For  any  decision  tree,  T(jt),  realizing  it(€  Sn)  , 

its  cost,  (t(jt)),  is  equal  to 

C  (T(*))  =SEp(B  )•  {S  t   } 

where  i  is  an  index  such  that  x.   is  1  or  0  (but  not  "-")  in  the  n- tuple 
k  ik 

B.  =  (x.  ,  x.  ,  ....  x.  ). 
l     l  '   l  '     '   l 
12        n 

The  proof  is  emitted. 
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EXAMPLE  k.6. 

To  verify  the  theorem,  consider  the  following  partition,  it, 
and  the  tree,  T(rt),  realizing  rt. 


B2  = 


((-,1,0)} 

((0,0,0)} 

{(1,0,0)} 


T(«) 


C  (T(jt))can  be  calculated  by  its  definition  as  follows, 


Cp(T(n))  =  Pr(B1)  •  t3  +  Pr^ 


(t3  +  t2)  +  Pr(B3) 


(t^  +  t2  +  tx)  +  Pr(B^)  ■  (t^  +  tg  +  tj. 

On  the  other  hand,  the  left  hand  side  of  the  equality  in  the  above 
theorem  is  calculated  as  follows.   Indices,  L  ,  of  B.  for  i  =  1,  2,  3 
and  U  are  (1,2,3},  {1,2,3},  {2,3}  and  (3},  respectively.   Then,  the 
terms  of  the  left  hand  side  are  Pr(B  )  ■  (t  +  t  +  t  ),  Pr(B  )  • 
(t-L  +  tg  +  t  ),  Pr(B  )  •  (t  +  t  )  and  Pr(B]+)  •  t  ,  and  it  is  easy  to 


see  that  this  sum  is  equal  to  C  (t(jt)) 
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We  learned  that  the  P-cost  of  all  decision  trees  realizing 
tc  is  the  same,  and  it  depends  only  on  the  partition  it.      We  define 
P-cost,  C  (jt),  for  a  partition  jt  by 


Cp(it)  =2  Pr(B.)-(2  t  }. 
i     x   i,   k 


k 

In  a  way  similar  to  the  case  of  #-cost,  the  above  definition  can  be 
extended  to  a  nonrealizable  partition,  it. 

To  clarify  differing  aspects  of  these  three  costs,  they  are 
summarized  in  the  following  proposition. 

PROPOSITION  k.5- 


1.  For  a  partition,  Tt(  e  S   ),  its  #-cost,  C(it) ,    is  defined  by 

C(jt)  -  #(«)  -  1. 

If  it   is  realizable  (e  SQ),  then  all  decision  trees  realizing 

it   have  the  same  #-cost,  C(T(tt))  =  C(n)  =  #(jt)  -  1. 

2.  M-cost,  C  (T(rt)),  is  defined  for  a  realizable  partition, 

it(e  Sq),  and  a  decision  tree,  T(rt),  realizing  jt,  by 

C.,(T(jt))  =2s., 
M        i  x 

and  it  may  vary  for  each  tree. 

3.  For  a  partition,  jr(e  Sr),  P-cost,  C  (it),  is  defined  by 

Cp(*)  =  2  Pr(B.)  •  (S  t.  }. 
i         ik  k 

If  it   is   realizable,    (e  S   ) ,  then  all  decision  trees   realizing 

7t  have  the   same  cost  C_,(T(jt))  =  C   (it). 

r  r 

Now  we  can  present  the  following  proposition  concerning  the 
relationship  between  "cost  inequality"  and  "partition  inequality". 
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PROPOSITION  k.6. 


1.  Assume  jt  ,  it  e  s  and  it  <  it  .   Then 
C(it1)  >  C(it2)  and  Cp(it1)  >  Cp(rtg). 

2.  Assume  it  ,  it  G  S^  and  it  <  it  .  Let  T  and  T  be  decision 
trees  realizing  it  and  it  ,  respectively.  Then,  for  some 
T±   and.  T2, 

CM(Tl(:r))  ^CM(T2(jt))  holds,  and. 
for  other  T  and  T  , 
CM(T1(it))  <  CM(T2(it))  holds. 


PROOF:   1.   It  is  obvious  from  the  definitions  of  #-cost  and  P-cost 
2.   We  show  the  following  example,   it  and  it  are  both  realizable  and 
it  <  it  holds. 


T  (*) 


T  (n   ' 
2  2- 


Since  CM(T1(it1))  =  2S;L  +  2s2  +  s3  and  ^(T^))  =  S;L  +  s2  +  2^,  the 
sign  of  C^^C^))  -  CM(T2(it2))  =  (S;L  +  s2  -  b3)  may  take  positive 
or  negative  values  (or  zero).  Q.E.D. 

Next  we  associate  the  above  properties  with  procedures 
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constructing  decision  trees,  i.e.,  Procedure  R  and  its  related  algorithms. 
We  recall  that  Procedure  R  generally  constructs  a  decision  tree  realizing 
rt  if  it  is  realizable  (esj.  First  we  consider  the  set  S  of  all 
realizable  partitions.   Propositions  k.%  and  k.6.  lead  to  the  following 
theorem. 

THEOREM  h.9. 

Assume  jt  is  realizable  (e  S  )  •   Then, 

1.  Procedure  R  with  the  LCF  policy  always  constructs  an 
optimal  decision  tree  for  both  #-cost  and  P-Cost. 

2.  However,  Procedure  R  with  the  LCF  policy  may  not  construct 
an  optimal  decision  tree  for  M-cost. 

PROOF:   1.   If  tt  is  realizable,  then  Procedure  R  with  the  LCF  policy 
constructs  a  decision  tree  realizing  rt  and,  according  to  Proposition 
h.5t    its  cost,  C(t(jt)),  (or  C  (t(tt))  is  the  same  for  any  other  decision 
trees.   Proposition  k.6.    says  that,  for  any  jt'(<jt),  C(jt')  >  C(tt), 
(or  C -□(*')  >  C  (tc))  holds.   Therefore,  the  constructed  tree  is  optimal. 
2.   Procedure  R  with  the  LCF  policy  always  constructs  a  decision  tree 
realizing  a  partition  it  if  it  is  realizable.   However,  Proposition  k.6, 
says  that  there  exists  a  partition  tt'(<  it)  and  tree  T'(Tt')  such  that 
C(T' (it'))  <  C  (T(jt)).   Therefore,  T(jt)  may  not  be  optimal.         Q.E.D. 

The  above  theorem  says  that,  if  a  given  partition,  it,  is 
realizable,  Procedure  R  with  the  LCF  policy  works  best  for  #-cost 
and  p-cost  but  not  for  M-cost.   For  M-cost,  there  may  exist  an  optimal 
decision  tree  realizing  it'  (jt'<  jt)  for  a  given  partition  it. 
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Although  the  following  modification  of  the  LCF  policy  does  not  guarantee 
optimality,  it  suggests,  however,  a  simple  and  reasonable  way  of  selecting 
a  condition,  C  ,  at  each  step  of  Procedure  R  for  M-cost. 

Modified  Lossf ree-Condltion-First  (MLCF)  Policy 
At  each  step  of  Procedure  R, 

1)  if  there  exists  only  one  lossfree  condition,  C,  then  choose 
it  or 

2)  if  there  are  several  such  conditions,  then  choose  C.  whose 

'  1 

s .  is  a  maximum. 

1 

If  this  is  applied  to  a  realizable  partition,  it,  a  decision 
tree,  T(it),  realizing  it  (not  it'  such  that  it'  <  it)  is  constructed.   Of 
course,  it  is  an  optimal  tree  for  #-cost  and  P-cost.   Since  M-cost,  in 
general,  decreases  if  a  condition,  C,  with  a  larger  s.  is  chosen  at  a 
higher  level  of  the  tree,  the  MLCF  policy  constructs  a  near  optimal  tree 
and  it  is  the  best  among  all  decision  trees  realizing  the  realizable 
partition  it. 

So  far  we  have  shown  how  Procedure  R  and  its  related  LCF  policy 
work  for  a  realizable  partition.   Next  we  consider  the  set  S  of  all 
rule  partitions.   Our  object  is  to  find  a  realizable  partition,  it',  for 
a  given  nonrealizable  partition,  it(eSr)^  such  that  it'  <  it  and  C(it')  (or 
C  (it'))  is  minimized.   The  following  theorem  defines  where  such  an 
optimal  realizable  partition  it'  exists  in  S  for  a  given  nonrealizable 
partition,  it,  in  S  .   We  need  the  following  terminology. 
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DEFINITION  k.$. 

A  subset  B  of  a  partially  ordered  set  C  is  said  to  be  maximal 
in  C  if  and  only  if,  for  all  x  e  B  and  all  y  e  C,  either  x  >  y  holds  or 
else  x  and  y  are  incomparable. 

THEOREM  4.10. 


For  a  nonrealizable  partition,  it,  assume  an  optimal  decision 
tree  realizes  a  partition,  it',  (it'  <  itf    it'€  S  and  it  e  S  )  for  #-cost 

or  P-cost.   Then  such  it'  must  be  in  a  maximal  set  of  S.  D  S  ,  where  S 

0    it7        it 

is  the  set  of  all  partitions  a   such  that  a  <   it,  i.e.,  S  =  [a   €  S  |  a  <     it}. 

PROOF :   it'  should  be  in  S  .   If  there  exists  a  realizable  partition  it" 

such  that  it"  >  it',  then  Proposition  k.6.    says  that  C(it")  <  C(it')  and 

C  (it")  <  C  (it').   This  contradicts  the  fact  that  it'  is  a  partition 
P     -  P 

realized  by  optimal  decision  tree.  Q.E.  D. 

According  to  the  above  theorem,  an  optimal  solution  is  a  max- 
imal element  of  S   D  S  .   Therefore,  we  can  find  an  optimal  solution  for 
P-cost  in  this  way:  First  we  neglect  all  probabilities,  Pr(v),  and  time^ 
t.,  required  to  process  C.  Enumerate  all  -elements  in  the  maximal  set  of 
all  realizable  partitions  which  are  less  than  or  equal  to  a  given  parti- 
tion, it.   Then,  calculate  P-cost  of  all  these  elements.   An  optimal  solution 

is  one  of  these  elements  whose  P-cost  is  a  minimum. 

REMARK 

A  partition  realized  by  an  optimal  decision  tree  for  M-cost 
may  not  be  in  this  maximal  set. 
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This  procedure  which  is  based  on  Theorem  4.10 .,  however,  is 
impractical  because  the  enumeration  of  all  elements  of  S   (IS  is  rather 
exhaustive.   As  an  alternative,  in  the  following  section,  we  modify  the 
iterated  local  minimization  for  #-cost  (which  was  proposed  in  Section  4.2.) 
so  that  the  modified  version  can  be  applied  to  P-cost  optimization  problems, 
For  this  modification,  we  first  analyze  the  inequality  C  (it  )  >  C  (jt  ) 
for  a   :  tt„  which  was  seen  in  Proposition  4.6.   For  #-cost,  it  has  been 
analyzed  and  associated  with  Procedure  R  in  Proposition  4.2.   It  was 
shown  there, 

C(it  )  -C(*2)  =Si(C  ,  a.) 

i     i 

where  l(C   ,  cr.)  is  the  loss  incurred  by  C   with  respect  to  <x.  and  the 
s.    1        s.  1 

i  i 

sumZUC  ,  (J.)  is  taken  over  all  internal  nodes  i  of  the  tree  and  called 
i    si   x 

the  total  loss.   There  is  a  similarity  for  C  (jt_)  >  C  (rt_)  as  we  now 

pi  —  p  2 

show. 
DEFINITION  4.6. 


Consider  choosing  C  at  a  step  of  Procedure  R,  where  we  are 
working  with  a  partition,  a.   Then,  the  loss  Si    (C-  ,  cr)  incurred  by  C 

P   3  S 

with  respect  to  cr  is  defined  by 
I    (C  ,  0")  =*  t  E  Pr(Bk) 

P   s'         S  k    v  CT 


where  B  is  a  block  of  cr  which  is  split  by  C  .   That  is,  the  sum  is 


"   s 

■&■         x.   i-u  4.  i  -c  i,   j_  -,   -r-k   /  k   k      k> 
such  that  x  of  the  tuple  I 

cr  s  *  a 


taken  over  all  B^  such  that  xj  of  the  tuple  B^  ■■   (x^,  x^, . . . ,  x£) 


is  -   (in  other  words,  k  is  an  index  whose  B  is  split  by  the  condition 


C  ). 
s 
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EXAMPLE  4.7. 

Consider  the  follwoing  cr.   If  we  choose  C  ,  then  the  loss  ^-(C  ,  cr) 
is  t?  •  Pr(B  )  since  only  B  is  split  by  C  .   If  we  choose  C  ,  it  splits  the 
blocks  B  and  B  .   Then,  the  loss  *p(C .,  cr)  =  t  •  {Pr(B  )  +  Pr(B  ) } . 


B, 


THEOREM  4.11. 

Assume  Procedure  R  is  applied  to  a  partition,  it  ,  and  the  resultant 
decision  tree  realizes  a  partitions,  Jt  .   Then, 

Cp(«  )  -0p(,2)  -|  lp(C  ,  <r) 

1 


=  Z  t 

i  s. 


{Z  Er(Bp), 

k     u 


where  the  sum  Z  i_(C  ,  cr.)  is  taken  over  all  internal  nodes  1  of  the  tree, 
i   P  s.   1 

1 

The  proof  of  this  theorem  is  based  on  Theorem  4.8.,  and  can  be  done 
is  a  similar  way  as  Proposition  4.2, is  proved.   It  is  omitted,  however. 


EXAMPLE  K 8. 

To  verify  Theorem  4.11.,  we  show  an  example.   Assume  Procedure  R 
is  applied  to  the  following  *t  and  the  constructed  tree  is  T(it. )  which  real- 
izes it  . 
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V*i> 


According  to  Theorem  ^.8V  C  (it  )  is  equal  to 
Pr^)  •  tj  +  Pr(B2)  •  (tg  +  t5)  +  Pr(B3)  •  (^  +  tg  +  t  ) 
+  Pr(B^)  •  (t±   +  tg  +  tj). 
Also,  C  (it  )  is  calculated  and  equals 

Pr(B.p  ■  (tg  +  t_)  +  Er(Bj)  •  (tx  +  tg  +  t3) 


+  Pr  B' 


X    +   tn  +  t  )  +  Pr(B0)  •  (t0  +  t  ) 


T.'   v  1    2    3'  '  "  x~2y    v"2  '  "3' 
+  Pr(B5)  •  (t±   +  tg  +  t  )  +  Fr(B^)  •  (^  +  tg  +  t?) 


Therefore, 


V*i>  -  cp("2) 


69 


=  t2  .  {Pr(B')  +  Pr(B^)  +  Pr(B*'))  +  t±   •  {Pr(Bj)  +  Pr(B^')} 

=  t2  •  Pr(B1)  +  tx  '  Pr(B^  U  B*)  . 

On  the  other  hand,  the  total  loss  Z   i_(C  ,  <r  )  is  calculated  as  follows. 

i  r     s.    l 

The  root  C  of  T(it  )  splits  a  block,  B _,  then  the  loss  by  the  C  at  this 
step  is  t   •  Er(B  ).   Next,  C     of  T(«1)  splits  {B^  U  Bj1)  into  B^  and  B.J"  , 
so  the  loss  at  this  stage  is  t±   •  Pr(B^'  UbJ')  =  t   ■  {Pr(B^)  +  Pr(B^)}  . 
Since  the  other  conditions  of  T(it  )  generates  loss  zero,  the  total  loss  is 

t2  •  Pr(B1)  +  t±   •  (Er(Bj)  +  ErCB*')}  , 
and  this  is  equal  to  the  quantity  C  (it  )  -  C  (it  ) . 

REMARK 

The  following  Figure  k.6.    suggests  a  more  simple  and  intuitive 
way  to  calculate  the  difference  C  (it  )  -  C  (it  ).  We  compare  it  with  it  ,  and 
particularly  their  edges  (i.e.,  1-cube)  are  observed.   Since  it  <  it  holds, 
any  edge  of  it  which  is  covered  by  a  rule  (edges  of  B,'  and  Bp  in  EXAMPLE  U.8.) 
corresponds  to  an  edge  of  it  which  is  also  covered  by  a  rule  (the  two  corre- 
sponding edges  of  it  are  included  in  B,  and  Bp,  respectively,  in  the  example). 
However,  there  are  edges  of  it  which  are  not  covered  by  a  rule,  but,  whose 
corresponding  edges  of  it  are  covered  by  a  rule.   They  are  labeled  by  a,  b, 
and  c  in  Figure  h.6. 

C. 

C2 

FIGURE  k.6. 


TO 


If  we  calculate  the  following  quantity  for  each  of  these  edges  a,  b,  and  c, 


t.  '  {Pr(v.  )  +  Rr(v.  )}; 


where  vertices  v.   and  v.   are  adjacent  through  the  edge  C.  whose  time 
11      X2  X 

required  to  process  is  t.,  and  add  them  together,  then  this  sum  is  equal  to 

the  difference  C  (it  )  -  Cp(*  ), 


Now  we  present  an  algorithm  for  P-cost. 


ALGORITHM 


At  each  step  of  Procedure  R.  choose  C.  such  that  l^(C.,    cr)  is  a 

7  1  Pi 

minimum  over  all  i^(C  ,  cr)  for  possible  choices  of  C  . 

P  s  s 

This  algorithm  does  not  always  generate  optimal  trees,  but  it  is 
based  on  plausible  arguments,  and  it  is  easy  to  implement. 
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5.   A  DECOMPOSITION  THEORY  OF  DECISION  TABLES  AND  DECISION  TREES 

5.1.   Introduction 

With  the  intention  to  process  a  large  decision  table  effectively 
in  parallel  we  consider  a  decomposition  of  an  n-cube  partition.  The  study 
of  decomposition  of  n-cube  partitions  is  motivated  by  the  possibility  of 
processing  large  decision  tables  effectively  in  parallel.   In  order  to  give 
an  intuitive  feeling  for  our  problem,  we  consider  the  following  example 
shown  below. 


EXAMPLE  5.1. 


Assume  that  the  following  decision  tree  T  realizes  a  partition  jt. 


If  we  remove  the  edges  of  T  marked  with  ~  ,  we  obtain  three  smaller 


decision  trees  T_ ,  T^  and  T  as  shown  nsxt. 
12      3 
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{\}  fR6 


LK3J 


V 

?8 


If  three  independent  processors  are  available  and  each  of  them  processes 
T.  (i  =  1,2,3) j   we  obtain  a  set  of  three  outcomes  from  these  three 
processors,  say,  (R±,    Rg,  R  ),  (Rg,  R^,  R  )  and  (R^  Rg,  R  ,  R  ,  Rg). 
Taking  the  intersection  of  these  three  in  the  set  theoretical  sense  yields 
(R1,  R2r   R3)  n  (R2,  R^,  R  )  n  (Rx,  R  ,  R  ,  R^  Rg)  =  R2:   thus,  we  can 
identify  the  block,  R_>  of  the  original  partition.   This  scheme  can  be 
visualized  by  the  following  sketch  of  Figure  5.1.   The  intersection 
operation  can  be  realized  by  a  simple  "Logical  AND"  function. 


INPUT 


OUTPUT 


FIGURE  5.1. 
We  note  the  fact  that  each  processor  deals  with  a  smaller 
decision  tree,  and  hence  the  average  processing  time  of  this  scheme  might 
be  shorter  than  that  for  the  single  processor  case.   In  succeeding  sections 
of  this  chapter,  a  decomposition  problem  of  a  partition,  which  can  be 
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applied  to  a  parallel  processing  of  a  decision  table,  will  be  formulated 
and  some  objective  functions  for  efficient  decompositions  will  be  introduced. 
After  a  theoretical  analysis  of  this  problem,  discussions  for  the  construction 
of  a  pair  of  decision  trees  for  a  given  partition  are  developed.   Based  on 
a  procedure,  called  Procedure  D,  a  heuristic  algorithm  for  efficient  decom- 
positions is  also  presented  in  the  last  section. 

5.2.  Decomposition  Problem  and  Objective  Functions 

First  we  define  the  notation  of  a  decomposition  of  a  partition. 

DEFINITION  5-1. 

A  set  of  n-cube  partitions,  it.  (i  =  1,2,  ,,k),  is  called  a  k- 
decomposition  (or  simply  decomposition)  of  an  n-cube  partition,  it,  if  and 

only  if 

k 

n  it .  <  it  . 
i=l  X" 

This  is  a  necessary  and  sufficient  condition  for  our  parallel 

processing  scheme  to  work.   Then,  our  decomposition  problem  is  how  to  find 

a  decomposition  {it.}  for  a  given  partition  it  so  that  we  may  effectively  do 

parallel  processing.  As  a  measure  of  efficiency  for  decompositions,  the 

following  two  objective  functions  are  introduced.   They  are  understood  as 

extensions  of  the  conventional  cost  of  a  decision  tree  discussed  in  the 

previous  chapter. 

OBJECTIVE  FUNCTION  1. 

For  a  decomposition  { it .  }  of  it,  objective  function  1,  C  (it  ,jt  ,  *  '  '  ,it  ) 
is  defined  by 


Ik 


k 


where  #(tc)  is  the  number  of  blocks  of  it. 

OBJECTIVE  FUNCTION  2 

For  a  decomposition  [n.)    of  n,    objective  function  2, 

CqC*-,*  *2>---;  \);  is  defined  by 

c2(v  v...f  «k)   -  .|  {#(,,)  -1). 

In  Example  5. 3.  both  cost  functions  are  illustrated.   The  first, 

k 
objective  function  C  f   is  the  number  of  blocks  of  the  partition  H  it. . 

1  k  i=l  X 

This  number  is  related  to  the  distance  from  it  to  II  n.  on  the  lattice  of 

k  1=1  X 

all  n-cube  partitions  since  #(JT  it.)  -  #.(*)  is  the  number  of  blocks 

1*1  1 
which  are  split.   The  second  one  is  based  on  more  practical  considerations, 

It  corresponds  to  the  sum  of  internal  nodes  of  all  decision  trees,  or 
equivalently  to  the  total  memory  space  requirement,  since  #(rt.)  -1 
is  proportional  to  the  storage  space  required  for  the  i-th  processor. 
(Each  internal  node  of  a  decision  tree  is  supposed  to  require  one  unit 
of  storage  space.)  There  is  another  objective  function  which  corresponds 
to  the  average  processing  time.   We  need,  however,  probabilities  of 
occurrence  of  rules  for  its  introduction.   It  will  not  be  considered  in 
this  thesis. 

So  far,  we  have  given  the  definition  of  a  decomposition  and 
have  introduced  two  objective  functions.   The  method  of  decomposition 
is  restricted  as  will  be  shown  later.   In  order  to  explain  the  motiva- 
tion for  such  a  restriction,  recall  Example  5.1.   The  idea  shown  there 
immediately  leads  to  the  following  statement. 
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PROPOSITION  5-1. 

Assume  that  a  decision  tree  T  realizes  an  n-cube  partition  Jt. 

Then,  a  set  of  subtrees,  T.  (i  =  l,2,,,k),  of  T  obtained  by  removing  any 

set  of  (k-l)  edges  of  T  induces  a  k-decomposition,  i.e., 

k 

II  it.  <  jt, 
i=l  x  ~ 

where  each  jt  (l   =  1,2,  ,,k)  is  the  n-cube  partition  realized  by  T.  . 

i  i 

To  clarify  this  proposition,  we  use  Example  5-1.   To  identify, 
say  R  ,  it  is  sufficient  to  be  given  answers  (i.e.,  "Yes"  or  "No") 
of  internal  nodes  C.  ,  C  ,  C  and  C  along  the  path  from  the  root  to  R  . 
In  practice,  trees  T..  and  T  give  such  information  for  this  case  of  R  . 
It  is  easy  to  see  that  the  resultant  set  of  trees  T.  obtained  by  the 
method  generally  provides  more  than  such  information. 

This  means  that  if  k  processors  are  available,  then  we  may 
decompose  the  original  tree  T  into  k  subtrees,  T  ,  T  , ,,  T  ,  so  that  our 
scheme  works.   In  other  words,  by  collecting  intermediate  results  from 
these  processors  and  multiplying  them,  we  can  identify  a  rule  of  the 
original  jt.   If  we  use  the  above  result  of  Proposition  5.1.,  one  more 
property  can  be  given  as  follows. 

PROPOSITION  5.2. 

For  a  given  realizable  partition  jt,  we  can  always  achieve  the 
inequality 

Min      C2(jt1,  Jt2,...,  h  )  <  #(jt)  -1 

vv*-*k<* 


for  arbitrary  k  (k  =1,2,,,) 
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PROOF:   Assume  a  decision  tree  T  realizes  it.   Applying  the  method  in 

Proposition  5.1.  to  T  yields  k  trees  T.  (i  =  1,2, •••,k)  which  realizes  x 
k  X  X 

such  that  IT  n.  <  k,   Then,  we  can  derive  the  following  formulae, 
i=l  1 

k 
C  («   Hg,.-.,  i^)  =  2  {#(n.)  -1} 

i=l 
k 
=  £   {the  number  of  internal  nodes  of  T. } 
i=l  X 

=  {the  number  of  internal  nodes  of  T} 

-  #(*)  -1  j 

that  is, 

C2^  V  *2' ""'  \^    =  ^^  _1' 

In  other  words,  if  it  is  realizable,  then  there  exists  at  least  one  k-decom- 

position  of  n   satisfying  the  expression  in  the  statement.  Q.E.D. 

This  proposition  shows  the  total  memory  requirement  for  our  parallel 
processing  scheme  is  at  most  the  memory  required  for  the  case  of  single 
processor.   A  similar  result  can  be  obtained  for  the  average  processing  time 
but  it  will  not  be  presented  here.   These  facts  show  the  advantages  of 
parallel  processing,  i.e.,  a  smaller  storage  requirement  and  shorter  process- 
ing time. 

The  method  of  decomposition  stated  in  Proposition  5«1«  is  a  useful 
tool  for  theoretical  analysis:   we  could,  for  example,  give  the  very  basic 
result  in  Proposition  5-2.   It  has,  however,  a  practical  deficiency  as  the 
following  example  shows . 
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EXAMPLE  5-2, 


T- 


We  assume  three  processors  are  available.   Removing  edges  of  T  between  C,  and 

both  C  _'s  results  in  three  isolated  subtrees  T,  ,  T_  and  T, .   Since  T„  and  T, 
d  1   d  $  d  J 

are  the  same,  however ,  we  actually  need  only  T  and  T  (or  TO .  In  other 
words,  two  processors  are  enough  for  this  decomposition.  It  is  concluded 
that  decomposing  a  decision  tree  by  removing  its  edges  is  impractical. 

In  the  above  example,  T  is  completely  the  same  as  T  and  the  set  of 
T  ,  T  and  T  can  be  considered  a  redundant  decomposition.   Similar  redundancy 
can  be  found,  to  some  extent,  in  Example  5-1.,  also.   Two  identical  conditions, 
C  's,  appear  in  T  and  T  ,  and  so  do  two  C  's  in  T  and  T  .   Two  CL  ' s  can  be 
seen  in  T  and  T  ,  respectively.   Generally,  we  may  need,  for  better  decom- 
position, this  kind  of  small  redundancy  where  the  same  conditions  appear  in 
different  subtrees.   However,  for  the  establishment  of  our  simple  decomposi- 
tion theory,  we  exclude  such  redundancy,  i.e.,  we  put  the  following  restric- 
tion on  our  decomposition  problems. 

RESTRICTION 


A  condition  C.  is  processed  by  one  and  only  one  processor,  i.e., 
a  condition  C.  appears  in  only  one  of  the  subtrees. 

For  simplicity  of  the  theoretical  analysis  of  decomposition  problems, 
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we  consider  only  the  case  k  =  2,  that  is,  the  dual  processor  case.  However, 
results  following  in  this  thesis  can  be  easily  extended  to  the  general  case. 
We  introduce  the  following  terminology. 

DEFINITION  5-2. 


If  a  pair  of  decision  trees  satisfy  Restriction,  they  are  called 
an  orthogonal  pair  of  decision  trees. 

DEFINITION  5.5. 

Let  C  's  be  conditions  corresponding  to  coordinates  of  an  n-cube 

i 

partition  it.   Then,  an  orthogonal  decomposition  of  it  is  a  pair  of  n-cube 
partitions,  jt_  and  ft   ,  such  that 


1)  it  •  n     <   it   and 

2)  S  n  S2  =  <jf       and  S.  U  S2  =  [all  Ci's}, 

where  the  essential  condition  set,  S.,  with  respect  to  it  is  defined  by 

J  J 

S  =  [C.  !  C.  is  esssntial  to  it.}  ,   for  i    =   1,2. 
j     1    1    J 

Note  that  C.  is  said  to  be  essential  to  n.  if  and  only  if  there 

exists  at  least  one  block  (x„  ,  x_,...,  x.,...,  x  )  of  it.  such  that  x.  is 

1   2       1       n      j  1 

either  1  or  0  (not  dash)  in  its  corresponding  C.  coordinate.   If  C.  is  not 

essential  to  an  n-cube  partition  it.,  then  it.  degenerates  to  an  (n-l)-cube 

J        J 

partition  by  removing  the  C.  coordinate.   It  is  said  to  be  essentially  an 
(n-l)-cube  partition. 

PROPOSITION  5.3. 

If  it  and  it  is  an  orthogonal  decomposition  of  it,  then  any  C.  €  S 
is  inessential  to  it  and  any  C.  G  S  is  inessential  to  it  .   Therefore, 
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it   is  essentially  an  (n-  #(S  ))-cube  partition  and  it  is  essentially  an 

(n-  #(S  ))-pube  partition,  where  #(A)  for  a  set  A  is  the  number  of  elements 
of  A. 

Some  orthogonal  decompositions  are  given  in  Example  5-3. 


EXAMPLE  5.3. 

We  show  four  orthogonal  decompositions,  (if  ,  if  ),  of  if.  Their  corre- 
sponding decision  trees,  T  and  T  ,  essential  consition  sets,  S  and  S  ,  and 
two  objective  functions,  C  (it  ,  it  )  and  C  (it  ,  it_),  are  also  shown. 

fa) 


W 


Cl(lTl'  ^  =  2k 
C2(iti;  it2)  =  8 


T2(0 


(b) 
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U3  • 


*1    '    *2 


1,(^5 


(c) 


T2(«2) 


T^) 


T2(,2) 


ci(V  It2) 
c2(^  *2) 


16 
6 


(a) 
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fl 


*1  '    *2 


w 


°1(V  7T2)  =  8 
Cg(ir  ,  ng)  =  4 


In  (a),  ft,  •  Jt_  =  n,  and  no  block  of  the  original  partition  rt  is 
split.   This  case  is  a  desirable  case  in  a  decomposition.   In  (b),  however, 
7T  •  tc  <  jt,  and  #(rr   •  it  )  -  #(rt)  =  9  -  5  =  ^+-   The  partition  rt  in  (c)  always 
yields  jt   •  it   =  0  for  any  nontrivial  orthogonal  decomposition.   (For  any  tt, 
there  exists  a  trivial  orthogonal  decomposition,  (it,  i),  of  jr.)  Case  (c)  also 
seems  a  good  decomposition.   In  summary,  (a)  and  (d)  are  good  decompositions, 
and  (b)  is  reasonably  good.   Case  (c)  is  not  effective,  because  we  test  all 
conditions  C  ,  C  ,  C  and  C,  for  all  possible  rules  of  jc. 
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5.3.   Analysis  of  Orthogonal  Decompositions 

In  this  section  we  analyze  an  orthogonal  decomposition  and  show 
a  necessary  and  sufficient  condition  for  a  pair  of  partitions  (or  decision 
trees)  to  become  a  decomposition  of  a  given  partition  it.   Based  on  results 
obtained  in  this  section,  the  synthesis  procedure,  i.e.,  how  to  construct 
orthogonal  decompositions  for  a  given  partition  will  be  described  in  the 
next  section.   From  now  on,  in  order  to  avoid  confusion  of  the  indices  of 
7T  and  it   ,  we  will  use  a   and  p,  instead,  respectively. 

First,  a  result  of  theoretical  analysis  concerning  an  orthogonal 
pair  of  decision  trees  is  presented. 

THEOREM  5.1. 


Let  T  and  T  be  an  orthogonal  pair  of  decision  trees  which  realize 
partitions  a   and  3,  respectively.   If  we  replace  every  terminal  node  of  T 
by  the  decision  tree  T  ,  this  new  tree,  denoted  by  T  *  T  ,  realizes  the 
partition  a   •  ^. 

EXAMPLE  5.k. 

Consider  Example  5.3.  case  (b)  again,  tc  and  it  are  rewritten  as 
a  and  (3,  respectively.  Decision  trees,  T  and  T  ,  realizing  a  and  3,  res- 
pectively, as  well  as  the  essential  condition  sets,  S  and  S  ,  are  also  shown. 


a 


<X 


a 


s1  =  (c2,  V 
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Sg  -  lc^,  c 


Since  C  and  C,  (e  S  )  are  inessential  to  p,  p  is  essentially  a  2-cube 
partition.   Its  degenerated  2-cube  partition  p   is  shown  below  in  Figure  5-2. 


^ 


n 


Degenerated  (3* 


FIGURE  5.2. 


Then,  the  tree  T  *  T  and  the  partition  a  '  p  are  shown  in 


Figure  5.3, 


FIGURE  5-3' 


Sk 


We  note  that  a  subcube  a.  of  a   corresponding  to  the  i-th  terminal 

node  of  T  is  further  refined  by  an  attached  subtree  T  ,  say,  {a  }  is 

refined  into  fp  ,  P  ,  p  }.   This  refinement  of  a.  (i  =  1,2,3)  are  shown  in 
Figure  5.k.    (a),  (b),  and  (c),  respectively. 


nU 


a. 


P 


P, 


P 


a. 


?h 


^ 


(a) 


FIGURE  5 A. 


(b) 


(c) 


The  concept  of  "refinement  of  a.  of  a  partition  cc  by  a  subtree  T  " 
or  "refinement  of  a  block  a.  by  another  partition  p",  as  seen  in  the  above 
example,  is  intuitive  and  straightforward.   Its  rigourous  definition  is  given 
as  follows. 


DEFINITION  5 .h. 

Let  \   and  IL  be  an  i-cube  partition  and  its  coordinate  set,i. e. , 

U,  =  {C.    C.  is  a  coordinate  of  \) 
A.     i  I   l 

and  #(ll  )  =  I.      Let  co  and  S  be  an  m-cube  partition  and  its  corresponding 
A.  CO 

essential  coordinate  set,  respectively,  i.e., 

S  =  {C.  I  C.  is  essential  to  col. 
CO      i  I   i  J 

(Now  we  see  that  co  is  essentially  a  #(S  )-cube  partition.  )  By  co  we  denote 
the  co's  degenerated  #(S  )-cube  partition.   If  S  gIL  holds,  then  co  can  be 


extended  to  its  i-cube  extension  t  by  adding  all  conditions  in  U.  -  S 
Then,  we  define  a  refinement,  \   *  co,  of  \   by  co  by 
\  *  w  zs  X  •  r  . 


co 
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PROPOSITION  5-k. 


If  u.  =  S  holds,  then  \   *  to  =  \   •  co. 

A.     W 


EXAMPLE  5-5. 


We  explain  the  refinement  a     *  p  of  a     by  p,  using  a,  and  p  of 


Example  5 • ^ ■ 


2 


U 


ql 


,cr  c2,  c3l 


lcl>  V 


ql  is  a  3-cube  partition  and  its  coordinate  set  is  U   =  {C,  ,  CL,  CL } . 
1  -^  a.     r  2'  J 

P  is  a  ^—  cube  partition  and  its  essential  coordinate  set  S   =  {C  ,  C.,}. 

Then  p  is  essentially  a  2-cube  partition  and  its  degenerated  for  P*  is 

C 
shown  in  Figure  5.5.   Since  S^  =  U   holds,  P*  can  be  extended  to  its 

Pa 

3-cube  partition  t  of  Figure   5-6.    by  adding  the   coordinate  C     e  IJ     -   S 


Finally,   QL    *  p   =  a      •    t   is   shown   in  Figure   5-7' 


u 


0 


P* 
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FIGURE  5.5. 


FIGURE  5-6. 


NOTE 


FIGURE  5.7, 


In  most  cases  in  this  thesis,  a   is  simply  a  cube  itself  (as 
seen  in  this  example)  rather  than  a  general  cube  partition.   In  other 
words,  it  is  a  partition  consisting  of  only  one  block.   Now  we  give 
the  proof  of  Theorem  5.1. 


PROOF:   By  replacing  the  i-th  terminal  node  of  T  by  a  subtree  1~, 
its  corresponding  block  a  of  a  is  refined  by  f3.   (a  *•  f3  is  well- 
defined  because  of  the  orthogonality  of  T  and  3L.)   To  do  this  replace- 
ment for  all  i,  every  block  a.    of  a   is  refined  into  a.   *  P  by  p.   This 
implies  that  a  terminal  node  of  T,  *   Tp  corresponds  to  a  block  of  a.  *  P 
in  one-to-one  manner.   Since  a  block  of  a.    *  P  is  a  block  of  a  •  P,  then 


Tx  *  T2  realizes  a 


Q.E.D. 
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Since  there  is  no  essential  difference  of  roles  between  a   and  (3 
(or  T  and  T  ),  T  *  Tp  and  T  *  T  realize  the  same  partition  a   •  p. 

COROLLARY  5-2. 


Let  a  pair  of  decision  trees  T  and  T  be  orthogonal  and  realize 
partitions  a   and  p,  respectively.   Then,  T  *  T  and  T  *  T  realize 
the  multiplication  a  •  p. 

Next  we  derive  a  necessary  and  sufficient  condition  that  an 
orthogonal  pair  of  decision  trees  realize  a  decomposition  of  a  given  parti- 
tion ti.   Its  fundamental  result  will  be  seen  in  Lemma  5.1.   Theorem  5«3« 
is  the  final  and  complete  statement  of  the  necessary  and  sufficient  condi- 
tion.  Based  on  Theorem  5»3«>  the  synthesis  problem,  i.e.,  constructing  an 
orthogonal  decomposition  for  a  given  partition  tx,   will  be  discussed  in  suc- 
ceeding sections. 

Assume  the  following  orthogonal  pair  of  decision  trees,  T  and  T  . 
Let  a  and  (3  be  those  partitions  which  are  realized  by  T  and  T  ,  respectively. 


i 
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In  order  to  analyze  the  relationship  between  a   .  (3  and  n, 
recall  Procedure  R  in  Chapter  ^.   Suppose  that  we  are  constructing  a 
decision  tree  for  a  given  it,      T  is  considered  as  a  growing  (inter- 
mediate) decision  tree  under  construction  by  Procedure  R,(i.e.,  T 
is  partially  constructed  by  some  steps  of  Procedure  R) ,  and  then 

we  associate  a  partition  it.  with  the  i-th  terminal  node  of  T  . 

i  1 

tt.  is  a  partition  which  is  to  be  realized  by  a  forthcoming  possible 

subtree  rooted  at  the  terminal  node  of  Tn  .   (We  note  that  the  xt.  's 

1  i 

are  different  from  the  a. 's.     a.    is   simply  a  subcube  as  a  block  of  a. 

i      i 

Then  the  following  Lemma  5«1-  is  obtained.) 


LEMMA  5.1. 


Assume  that  T  and  T  are  orthogonal  and  realize  a   and  $, 
respectively.   Then  a  necessary  and  sufficient  condition  for  the  pair 
of  qj  and  p  to  be  an  orthogonal  decomposition  of  n   is 


ct.   *  P  |  «,  for  every  i. 


PROOF:   First  we  prove  the  necessity  by  contradiction.   Assume  that 
jt.  *  p  <  it.  does  not  hold  for  a  particular  i.   Then,,  there  exists  a 
pair  of  two  0- cubes  a  and  b  such  that 

1)  a  and  b  are  in  a  block  of  a.    *  p,  and 

2)  a  is  in  a  block  of  n.,  but  b  is  in  another  block  of  tt.. 

i  i 

2)  means  that  a  and  b  are  in  different  blocks  of  jt  since  two  different 
blocks  of  it.  are  in  different  blocks  of  tt.   Then  we  are  led  to  the 
fact  that  a  and  b  are  in  a  block  of  a.  *  0,  i.e.,  a  block  of  a  •  p  but 
not  in  a  block  of  tc. 
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This  shows  that  a   •  P  <  it  does  not  hold.   Next  the  sufficient  condition 
is  proved.   If  a.  *  p  <:  n.  for  every  i,  a  block  of  a.  *  |3  which  is  a 
block  of  a   •  p  for  some  i  is  in  a  block  of  it..   A  block  of  it.,  however, 
is  in  a  block  of  jr.   Therefore,  a   •  P  <  *  holds.   That  is,  a  and  p  is 
an  orthogonal  decomposition  of  it.  Q.E.D, 

EXAMPLE  5.6. 

To  verify  the  above  statement,  we  use  the  example  from  case  (b) 
of  Example  5«3v  and  also  refer  to  Example  5.^-. 


If  we  assume  that  T  is  a  partially  constructed  decision  tree  by  Pro- 
cedure R,  we  can  associate  the  following  partitions  tt  ,  it  and  it  , 
with  the  respective  terminal  nodes  of  T  ,  as  shown  in  Figure  5-8. 
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FIGURE  5-8 


As  we  have  shown  in  Example  5.4.,  each  a.  *  P  is  as  follows: 
It  is  obvious  that  the  inequality  a.  *  P  ^  jt.  holds  for  every  i,  and  we 
can  conclude  that  this  fa,  f3)  is  an  orthogonal  decomposition  of  n. 


>si-*0 


a  ri  a 


ai  *  p  <  *]_ 


«2  *  P  <  *2 


a3  *  P  <  *5 


If  we  modify  the  tree  Tp  into  T'  as  shown  in  Figure  5-9 •>  then 
the  a.  *  f3'  are  obtained  in  a  similar  way. 


4* 
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Tg'O') 


°l  "  p' 


U 


0  t\  0 


QL  *  0' 


^  *  0 

3 


FIGURE  5-9- 


All  a.  *  P  are  less  than  their  corresponding  Jt.'s.   Therefore,  (en,  P') 
is  another  decomposition  of  it.  a   •  P'  is  also  shown  in  Figure  5-10. 


a   .  P' 


FIGURE  5.10. 


Next  we  present  some  properties  which  allow  Theorem  5-3«  to 
be  stated  more  neatly. 
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PROPERTY  5.1. 


1)  a.    and  ir.  always  have  the  same  coordinate  sets,  i.e., 

U   =  U   for  every  i. 
a.    jc. 

1     i 

2)  a.  *  Ms  always  well-defined. 


PROOF : 


l)  is  obvious,  and  its  proof  is  omitted.   In  order  to  prove 


2),  it  is  sufficient  to  show  that  S„  C  U   holds,  where  SQ  is  the  set 
' '  6  =  a.  8 

i 

of  essential  conditions  of  8.   Let  S  be  the  set  of  essential  conditions 

a 

of  a   and  V.  be  the  set  of  those  conditions  which  appear  along  the  path 

from  the  root  to  the  i-th  terminal  node  of  T  ,  respectively.   Then, 

V.  C  S  holds.   Therefore,  the  coordinate  set  U   of  a.  satisfies 
i  =  a  '  a.  ± 

i 

U   =  {all  conditions}  -  V  =  S  U  SD  -V.  3  g 
a.  i    a    p   i  =  6 

since  V.  <=  S  .   In  other  words,  a  refinement  of  a.    by  8  can  be  defined. 

Q.E.D. 

PROPERTY  $.2. 

Suppose  that  T  and  T  realize  an  orthogonal  decomposition 

(a,   8)  of  it.   If  the  i-th  terminal  node  of  T  is  located  at  the  £-th 

level  of  T  (i.e.,  £   internal  nodes  exist  from  the  root  to  this  node) 

and  we  let  V.  be  a  set  of  those  conditions  which  correspond  to  I   internal 

nodes,  then  conditions  in  S  -  V.  are  inessential  to  tx.,    i.e.,  #(S  )- 

a         i  i'  '     "     <x 

#(V. )    =  #(S   )    -   I   conditions   are   inessential  to  jt.. 
x  cc  j_ 


PROOF  J        We  know  that  U       =  S     U  SQ   -V.    and  V.cg   (by  Property  5.1.). 
it.         en        8i  i=  *  ' 
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As  we  have  shown  in  the  proof  of  Property  5-L,  U   -S_  =  U   -S_  =  S  -V. 

'  a.   p    jr.   p    a   i 

i       i 

are  also  true.   Conditions  which  are  added  to  form  a.    *■  p  are  those  in 

the  above  S  -V..   Since  a.   *   (3  c  jc.  holds,  those  conditions  augmented 

a   i         i     =  i      '  & 

are  also  inessential  to  it..  Q.E.D. 

i  ^ 

The  above  Property  5.2.  says,  in  short,  that  if  the  conditions 

C  ,  C  )>•>}  C  do  not  appear  along  the  path  from  the  root  to  the  i-th 

terminal  node  of  Tn ,  then  S  -{C,C,...,C}  conditions  are  in- 
l'  a         p'  q'   '  r 

essential  to  n.. 


LEMMA  5-2. 

Make  the  same  assumption  as  in  Property  5*2.   Let  S   be  the 

i 

set  of  essential  conditions  of  n  .   Then,  all  S   are  the  same  set  for 

i  it. 

i 

every  i,  and  are  equal  to  S  .  Q.E.D. 

PROOF :   As  we  have  seen  in  Property  5.1,  the  set  U   of  jt.'s  coordinates 

i 

is  U       =  S_    U  S     -V. .      Property  5-2,   however,    says  that  conditions  in 
Jt.  Bcci  r        o    s      ■>  >         a 

i 

S  -V.  are  inessential  to  n.   Therefore,  the  essential  coordinate  set 
a   i  i  ' 

S   of  it.  is  SQ  U  S   -V.  -  (S   -V.)  =  Ba. 
jr.     i     p    a   i   v  a       x'         p 

i 

This  Lemma  5 '2.  indicates  that  if  T^  and  H     realize  an  orthogonal  de- 
composition (a,  (3)  of  n,  then  all  partitions  jc.  and  P  are  essentially 

#(SQ)  -cube  partitions  and  they  have  the  same  set  of  essential  conditions. 
P 

If  we  define  multiplication  and  inequality  among  those  two  partitions 
of  different  sizes  which  have  the  same  set  of  essential  conditions 
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(as  a  natural  extension  of  conventional  multiplication  and  inequality 
between  two  partitions  of  the  same  size),  the  condition  a    *  8  <  it 
in  Lemma  5.1.  can  be  rewritten  as  simply  (3  <  ff..   This  is  obvious 

since  a  is  a  subcube  and  works  as  if  it  adjusts  the  difference  of  the 

i 

two  coordinate  sets  of  0  and  it..  However,  there  is  no  difference 

between  S„  and  S   .   Therefore,  we  can  compare  8  and  it.  in  terms  of 
B       it .  l 

l 

this  newly  defined  inequality. 

THEOREM  5-3- 

Assume  that  T  and  T  are  orthogonal  and  realize  a   and  8.   Then, 

1)  for  this  pair  (a,  8)  to  be  an  orthogonal  decomposition  of 

it,  S  =  S0  for  all  i  must  hold,  and 
it.    8 
i 

2)  a  necessary  and  sufficient  condition  for  this  (a,  8)  to  be  an 
orthogonal  decomposition  of  it  is  that 

8  <  it.   holds  for  all  i. 

PROOF :   l)  is  the  same  as  Lemma  5-2.   2)  is  derived  from  Lemma  5«1.  and 
Lemma  5.2.  Q.E.D. 

This  theorem  is  the  main  result  of  this  section.   Based  on 
this  analysis,  in  the  following  section  we  develop  discussions  about 
the  synthesis  problem  of  decompositions. 

5A.   Synthesis  of  Orthogonal  Decompositions 

In  this  section,  a  procedure,  called  Procedure  D,  to  construct 
orthogonal  decompositions  of  it  is  shown.  Theorem  5«3-  plays  a  key  role 
in  that  procedure:   that  is,  once  T,  is  given  and  the  it.  are  determined 
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we  check  whether  these  essential  condition  sets  S   are  all  the  same 

ft . 

1 

or  not.   If  they  are  identical  sets,  then  3,  the  counterpart  of  a,    can 
be  determined  by 

P  =  -.?  ■   *■ 

all  i   i 


We 


always  use  equality  S  *  \ \  ,    it.  instead  of  8  <  ^   .    n. 
J  ^  all  li  =  all  l  i 


since  the  maximum  element  over  all  6'  such  that  8'  <  H  n.    is  II  n.    and 

=  i  i    i  i 

8  is  always  better  than  8'  (8'  <  8)  for  the  objective  functions  pre- 
viously defined.   That  is; 

PROPOSITION  5-5- 


If  (a,  8)  is  an  orthogonal  decomposition  of  tt,  then  so  is 
(a,  8 ' )  for  any  3 ' such  that  8 '  <  8 .   Furthermore ,  the  following  two 
inequalities  concerning  objective  functions  hold: 

C  (a,  8)  <  c  (a,  P'),  and 

C2(a,  p)  <  CgCa,  £'). 
The  proof  is  omitted  since  it  is  obvious. 

Now,  consider  an  example  to  illustrate  Procedure  D.   A  rigor- 
ous description  of  the  procedure  will  be  given  later. 

EXAMPLE  5-T- 


Assume  a  ^--cube  partition  jt  shown  in  Figure  5-H«   If  T,  is 
the  decision  tree  with  the  root  C  only,  the  counterpart  6  is  deter- 
mined by  8  =  tt   •  n     and  it  is  also  shown,  a   •  8  is  also  shown  in 
Figure  5- 11  •> and  it  is  verified  that  a   •  8  <  ir  holds. 
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■•0- 


T20) 


-  «!   •    *2 


3        X     c, 


a  .   p 


FIGURE   5.11. 
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Now  try  to  expand  the  tree  T  into  T  of  Figure  5.12.  (a),  by- 
replacing  two  terminal  nodes  of  T,  by  two  C.  rs  as  follows:   All  h- 

tt.'s  are  also  shown  and  their  essential  condition  sets,  S  ,  are  all 

i 
the  same,  i.e.,  S   =  {C  ,  C  }.   The  counterpart  3'  of  a'  is  deter- 

i 


m 


ined  by  p'  =-5-|  **•    and  is  also  given  in  Figure  5-12.  (a).  We  can  con- 


struct the  decision  tree  T^  realizing  3' 


T1*(«') 


P  *  =11  ji 


*J 


T^(3^ 


*1      *2       *3      \ 

FIGURE  5.12.  (a). 

k 
Then  we  note  the  fact  that  instead  of  using  3'  =  .it  jt!  for  determing  P', 

it  can  alternatively  be  derived  by  3'  =  B   •  3  where  3-,  and  3p  are 

two  partitions  obtained  by  removing  the  coordinate  C.  from  3*   This 

process  is  shown  in  Figure  5*12. (b). 
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P'-fj_.   P2 


-J 


FIGURE  5-12.  (b). 


So  far,  we  have  obtained  two  orthogonal  decompositions  (a,  p) 
and  (a' ,  p')  of  jr.   T'  is  obtained  by  expanding  the  tree  T  in  such 
a  way  that  the  two  terminal  nodes  of  T  are  replaced  by  a  condition  CV. 
This  fact  implies  a1   <  a.     Moreover,  the  following  two  points  should 
be  noted  regarding  this  step: 

l)  two  terminal  nodes  of  T  are  replaced  by  the  same  condition 
CV,  not  by  different  conditions  (say,  C,  for  one  node  and 
CV  for  the  other),  and 
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2)   p'  can  be  determined  by  p'  =  p   •  p  as  an  alternative  to 

We  know  that  the  pair  (I,  xt)  is  a  trivial  orthogonal  decomposition 
of  jt.   So  this  means  we  have  obtained  the  following  sequence  of  or- 
thogonal decompositions  of  Jt, 

(I,  jt)  -  (a,  p)  »  (a',  p')- 

If  we  consider  the  process  from  (I,  jt)  to  (a,  P)  as  choosing  a 
coordinate  C_  from  jt  and  transplanting  it  in  the  null  tree  (realizing 
the  I-partition) ,  it  forms  T  ;  and  the  corresponding  P  can  be  deter- 
mined by  P  =  jt   •  jt  where  jt  and  it  are  partitions  obtained  by  removing 
the  C  coordinate  from  it.   In  a  similar  way,  the  second  step  from  (cu,  P) 
to  (en',  P')  can  be  considered  as  selecting  the  condition  C>  from  P  and 
transplanting  it  at  the  two  terminal  nodes  of  T  .   Then  its  counter- 
part p'  is  determined  by  p'  =  p   •  P  where  P  and  P  are  two  partitions 
obtained  by  removing  the  C«  coordinate  from  p.   Continuing  this  process 
n  times  we  will  obtain  the  following  sequence  of  decompositions  of  it: 

(I,  it)  ->  (a,  p)  -  (a1,   p')  -  (a",  p")  ->  ...  (a*'",  I), 
by  C1    by  C^ 
where  I  >  a  >  a'  '.  *   . . .  . 

Then  a  question  arises  concerning  point  l) .  Why  should  we 
choose  the  same  condition  C.  to  be  transplanted  at  the  terminal  nodes 
of  T  ?  To  answer  this  question,  choose  CL  and  C,  for  the  left  and  right 
terminal  nodes  of  T  and  let  this  expanded  tree  be  denoted  by  T"  of 
Figure  5-13- 
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T^V) 


-JC'U 


FIGURE  5 -13- 


Immediately  we  see  the  contradiction  to  the  statement  of  the  necessary 

condition  in  Theorem  5-3«   In  practice,  Jt_"  and  at  "  have  essential 

conditions  CL  and  C.  .   On  the  other  hand,  jt  ''  and  jti,"  have  C_  and  CL 
2  k  ?      3       k  2      3 

as  their  essential  conditions.   Since  S  „  U  S_„  =  {  all  conditions  Cn 

a,  P"  1 

through  C^}  and  S^,,  =  (C*,  C  ,  C^} ,  SQ„   =  {C2}  must  hold.   Therefore,  we  can 
conclude  that  this  a"  realized  by  T"  cannot  have  as  its  counterpart  |3' 
for  the  pair  to  be  an  orthogonal  decomposition  of  n, 

As  we  have  shown  in  the  above  example,  generally  only  one 
condition  can  be  chosen  and  transplanted  at  all  terminal  nodes  of  T, 
in  order  to  form  the  next  tree  T' .   (This  is  also  generally  true  for 
any  step  from  (a(i),  p(i))  to  (a(i+l),  P(i+l).) 


101 


There  is  another  aspect  of  this  situation,  however.  It  seems  to  be  just 
a  special  case  but  actually  it  is  an  important  aspect.  It  is  explained 
in  the  following. 

Assume  the  step  from  (a,    P) .   If  we  choose  the  condition  C 
to  be  transplanted  at  the  two  terminal  nodes  of  T  ,    its  corresponding 


tree  I*  is  as  in  Figure  5.1^1-.  (a). 


*(«*) 


V 


V 


y 


v 


u 


k 

p*  =  n 

i-l 


i'lGURE   5.1^- fa). 


T2*(P») 


•ji    -x-    -  -/r     * 
1  2 


Modified  T* 


'3  '^ 

FIGURE   5.1^. (b). 
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k 
Then,  the  counterpart  P*  can  be  determined  by  .JJLrt.*.  We  note,  however, 

that  it,  was  essentially  a  2 -cube  partition  and  S   =  {C  ,  C.  }  and  C 

1  ^     }        4       d 

is  not  essential.  Also,  jt*  =  n*   holds,  where  jc*  and  it*;  are  obtained  by 
removing  the  coordinate  C  from  n  .   These  facts  suggest  taking  the  left 
descendant  node  C  out  of  T*  and  associating  n*  with  the  original  left 
terminal  node  of  T  of  Figure  5.1^. (b). 

Now  we  learned  that  the  existence  of  the  inessential  condition 
C  of  tt  causes  some  modification  to  our  procedure.   That  is,  if  this 
C  is  chosen,  n     degenerates  to  a  "one-size  smaller"  partition  it*  (=  jc*) 
without  transplanting  this  C  at  the  corresponding  terminal  node  of  T  . 

Now  we  can  give  the  complete  form  of  the  procedure  for  con- 
structing a  series  of  orthogonal  decompositions  for  a  given  jr.   It 
is  named  Procedure  D. 
PROCEDURE  D 

1)  Initially  we  set  a       =  I  and  ^      =   tt,  and  their  correspond- 
ing essential  condition  sets  S     =  0  and  Sp 

{all  C.},  where  #(S   ')  =  n  (i.e.,  we  assume  that  jt  is 
essentially  an  n-cube  partition).   Let  T     be  a  null  tree, 

2)  For  the  step  from  (a^,   p^)  to  (c/i+1\  ^±+1h ,   we 
let  T    grow  to  T»   '   by  the  following  process. 

a)  Choose  a  condition  C^1+1')  €  S^,1'  and  let  S^1)  =  S^ 

U  {C(i+l)j  andS^i+l)  .  S^  -  {C(i+1)}. 

b)  Assume  that  j  is  the  index  representing  the  j-th 
terminal  node. 
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(Initially,  this  «.   ,  is  it    =  n.) 
J 

b-l)  If  C  is  essential  to  rt.  ',  then  replace  the 
terminal  node  of  T  by  this  C  .  Moreover, 
two  (n-i-l)-cube  partitions  ■n .  and  it. 


i+D 


which  are  obtained  by  removing  the  coordinate  C 
from  the  (n-i)-cube  partition  rt.  ',  are  associated 

J 

with  two  newly  created  left  and  right  descendant 

•i+i: 


nodes  j   and  j   diverging  from  C 
5- 15- (a).) 


(See  Figure 


FIGURE  5.15.  (a). 


10^ 


b-2)   If  C      is  not  essential  to  it.   ,  leave  this 
j-th  terminal  node  as  it  is.  With  this  node  is 
associated  the  (n-i-l)-cube  partition  it.    ,   which 

J 

is  the  degenerated  form  of  the  (n-i)-cube  partition 
it.   with  respect  to  the  coordinate  C     .   (See 
Figure  5.15.(b) .) 


T(i)(a(i)) 


T<i+l)(c*(i+l)) 


FIGURE  5.15- (b) 


c)  Repeat  the  above  process  b)  for  all  j.   Then,  all  it 


(i+1) 


and  it.      produced  by  b-l)  and  all  it.     by  b-2)  form 
the  new  set  of  t^1+   for  all  terminal  nodes  of  the  ex- 


panded  tree  T 


(i+D 
1 


d)   Split  the  (n-i)-cube  partition  P    into  two  (n-i-l)-cube 
partitions  p    and  £>^       by  removing  the  C     -  coordinate 


from  ^X\ 
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Let  P(i+1)  =f4l}  •  $f>. 
3)   Repeat  the  above  process  2)  for  i  =  0,l,2,,,n-l. 

EXAMPLE  5.8. 

Readers  are  encouraged  to  verify  the  above  by  Example  5-7. 
There  we  had  chosen  C  as  C      at  i  =  0  and  a  (T  ,  T  )  pair  was  con- 
structed which  realized  an  orthogonal  decomposition,  {a,   P),  of  n. 
At  the  next  step,  i  =  1,  if  we  chose  C      =  C    =  CV  ,  then,  the  pair 

(T' ,  T')  realizing  (a',  P')  would  be  obtained.   On  the  other  hand,  if  we 

(2) 
chose  C    =  C  alternatively,  Procedure  D  would  yield  the  pair  (T*,  T*) 

realizing  (c#,  ft*) . 

In  the  next  theorem,  we  will  show  that  Procedure  D  guarantees 
the  generation  of  a  series  of  n  orthogonal  decompositions  of  n. 

THEOREM  5.k. 

The  Procedure  D,  described  above  generates  a  sequence  of  n 
orthogonal  decompositions  of  n, 

(a^\   pM)  .(a(2),  p(2))^(a(5),  p^))  .. . .  (a^\   l) 

(i+1)  „     (i) 
where  a  <  a       . 

PROOF :   The  proof  is  by  induction.   For  i  =  0,  it  is  true  that  a         ~   I 

and  p    =  7t  are  an  orthogonal  decomposition  of  it.  We  assume,  as  the 

induction  hypothesis,  that  (a       ,  p   )  is  an  orthogonal  decomposition 

of  it.   In  other  words,  P(i)  <  n  «(*)  s{^    n  sj^  =  0  and  Sn(1+l)  U  S^i+l) 

-Jjl      2     r      1        2 

{all  C.)  hold  for  the  i-th  step.   Then  we  show  that  p'1^  <  1J  xc/1+1', 

i  =  J   J 


io6 


S(i+1)  n  S^i+1)  =  0  and  s|i+l)  U  S^i+l)  =  {all  C.}   hold  for  the  (i+l)-th 
step.   Since  p    <  rt.   holds  for  every  j,  the  selection  C      causes 

J 

p(i)  <  Tr(1+1)  and  p(1+1^  <  n(:1+1'>  for  j  such  that  c'1^  is  essential 

s 

to  /1')  (by  the  processes  2)-  b-2)  and  2)-d)).   For  j  such  that  (T   ' 
J 

^ .  -,  ^   (i)   o(i)    (i)     j.  j.  n        j.1  j.  o(i)  ^  (i+l) 
is  not  essential  to  jt  .   ,  (3    <  jt.   unmediately  means  that  P    <  it . 

and  p^  <  irii+1'  are  true  by  2)-  b-2)  and  2)-d).   Therefore,  for  all  j, 

(i+i)  .  (i)  .  p(i)  <  (i+i)  holds. 

1      2   =  j 

By  the  way  of  construction,  P  .    is  the  maximum  element  over 
all  partitions  P'  such  that  P'  <  H  jr..   Therefore,  we  obtain  P      = 

H  J.x+1\      It  is  obvious  that  S^  n  sj^  =  0  and  S^  U  S^   =   {all  C.) 
lj  1      2     r      1      2  l 

hold  for  all  i  =  0,1,,,  n-1.   Then,  any  decomposition  (a   ,  P  " '  )  of 
jt  is  orthogonal. 

One  more  property,  a^  <  a       ,  is  also  easily  shown.   T 

realizing  a    is  a  subtree  of  that  T.j     which  realizes  a  . 

Therefore,  cr1+  '   <  cr1  .  Q.E.D. 

Before  ending  this  section,  another  theorem  is  presented 
which  states  the  relationship  among  those  decompositions  generated  by 
Procedure  D  from  the  viewpoint  of  objective  functions. 
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THEOREM  5-5' 

In  the  step  from  (cr1^,  ^^)   to  (cr1+1',  ^1+10  in  Procedure 
D  (i  =  0,1*, *  n-l), 

1)  if  a  selected  condition  C      is  essential  to  all  n .   , 

J 

then 
(i+1)   a(i+l)  ^   (i)   Q(i) 
holds.   However, 

2)  if  C      is  not  essential  to  some  Jt.   ,  then  it  may 

j.1  j.      (i+l)   D(i+l)    -,   (i)   0(i) 
possibly  occur  that  a,  •  (3      and  a    •  P    are 

j  j.,  -l.  «  e  (i+l)   ^(i+l)\   j//  (i+l) 
not  comparable  and  that  C  [a  ,  p     )  =  #{ct 

p(i+l))  <  G^1*,  p(i))  =#(a(i)  •  p(i))  is  true. 

That  is,  {(or    •  P   )}  for  1*0,1,2,,,  n-1  is  not  always  monotonic  in 

terms  of  inequality  "  <  ",   Therefore,  {C^c/1',  p'1')}  is  not  a  monotonic 

function  of  i,  either,  although  {a       }  is  monotonic. 

PROOF :   We  prove  l)  first.   If  the  assumption  in  l)  holds,  every 
terminal  node  of  T     is  replaced  by  C      as  follows. 
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T,(1W«) 


T(i+D(a(i+D) 


(i+l) 


According  to  Theorem  5.1.  in  Section  50«*Q:      P    is  realized  by 


,(1)   T(D 
'1   *  L2 


i+l)  ,.ft(i+l) 


,(1+1)    J  i+l) 


T.','  #  T^  '  and  soa    '  •  P      is  realized  by  Tv     #  Tp 


Then  we  compare  these  two  trees,  Tn   *  Tp   and  T^     •*  Tp 
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(1+1) 


1     i2 


2 


m(i+l)      m 

Tl     *  T2 
Since  T   is  a  subtree  of  T     ,    attention  is  focus sed  on  the 

difference  between  T^Of  T^1'  *  T^   and  the  following  subtree  of  t|1+  ' 


.(1+1) 


of  Figure  5«l6< 


,(i+l) 


(i+1) 


,(i+D 


FIGURE  5-l6. 
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Then  it  is  easy  to  see  that  the  partition  P    realized  by  T^   is 
larger  than  or  equal  to  the  partition  realized  by  this  subtree  because: 


v   '""'  realizes  P     which  is  determined  by  p^ """"''  =  p^7  •  po 


,(1+1) 
u2 


:i+d    ji)  .  R(i) 


where  p^1'  and  P^   are  partitions  obtained  by  removing  the  CT1 
coordinate  from  p^1  .   The  following  sketch  of  Figure  5-17-  is  helpful 
in  understanding  the  fact. 


,(i) 


(i) 


(i+1) 


(i! 


(i+D 


(i+D 


(i) 


(!) 


FIGUEE  5.17- 

That  is,  in  Figure  5«17»>the  partition  P    is  larger  than  or  equal  to 
the  partition  which  is  realized  by  the  tree  in  the  middle.  This  latter 
partition  is  larger  than  or  equal  to  the  partition  realized  by  the  third 


tree  since  p^1+1^  =  p^1'  .  p^1)  <  pWf  p^1).   Therefore,  now  we  can  con- 
clude that  T^     *  T^1"1"   realizes  the  partition  which  is  less  than  or 
equal  to  the  partition  realized  by  T.^1'  *  T?   . 


Ill 


(i+l)   Q(i+l)  ^  (i)   «(i)  i  ,  -, 
In  other  words,  cr     •  P     <  or  y  •  pv  y  holds. 

In  order  to  show  the  truth  of  statement  2),    it  is  sufficient 

to  give  the  following  example . 

EXAMPLE  $.9. 

Assume  that  we  apply  Procedure  D  to  the  following  5-cube 
partition  it. 


If  we  select  conditions  CL  and  C,  at  i  =  0  and  i  =  1,  respectively,  then 

3      4 

(2)  (2)  (2)      (2) 

we  obtain  the  following  pair  Tv    and  T^  '    realizing  a         and  P  ', 

respectively. 
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T<2VS)) 


#(a(2))  =  * 


T<2)0(2)) 


'1 


P(2)        #(P(2))  =  5 


(1+1) 

Then  we  choose  C  as  C      for  i  =  2.  This  condition  C  is  not  essential 

to  some  partitions  jc.   and  Procedure  D  generates  the  pair  T.,    and  l\ 
as  follows . 


t[5)(c/3)) 


(3) 


#(oP})  =  6 


#(0(5))  =  3 
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(2)  (2) 

Then  we  can  show  below  two  orthogonal  decompositions  a         •    (3V  ' 


(3)    (3) 
and  a,  •   p    which  are  not  comparable. 


c/2)  .  p<«> 


C.(a(2\  P(2))  =  20 


c^  .  p(5) 


C,(a(5),  p(3))  =  18 


Furthermore,  C-Ca'*'',  p'*5')  <  C  (cr^  p^)  is  true,  because  C_(cr*  ,  P^) 
=  #(a(3)  •  #(p(3))  -  18  <  Cn(a(2),  p(2))  .  #(a(2))  •  #(P(2))  -  20. 


=  1 


Q.E.D. 
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5*5>        Discussion  of  Optimal  Decompositions 

In  the  previous  section  we  have  shown  the  procedure,  Procedure  D. 

to  construct  a  series  of  orthogonal  decompositions  [{a       ,   $r    ')}   for 

a  given  partition  it.   It,  however,  was  not  shown  which  condition  C 

should  be  selected  at  each  step  of  the  procedure.   The  role  of  this 

procedure,  therefore,  is  quite  similar  to  that  of  Procedure  R  in  Chapter 

3  for  constructing  a  decision  tree  for  a  given  partition  tc.   It  did  not 

show  which  condition  C    should  be  selected  at  the  i-th  step,  either. 

s 

It  could  show  only  the  way  to  construct  one  of  the  decision  trees  realiz- 
ing Jt'  (n'<  it)  for  a  given  it.   For  both  procedures,  it  is  quite  obvious 

that  the  way  of  choosing  the  condition  C      in  Procedure  D  (or  C 

s 

in  Procedure  R)  greatly  influences  the  cost  of  constructed  decision  trees. 

For  decomposition  problems,  we  have  proposed  two  objective 
functions  in  Section  5*2.   It  is  hoped  that,  based  on  Procedure  D,  some 
algorithms  to  construct  optimal  decompositions  for  these  objective 
functions  may  be  developed.   Some  intuitive  (heuristic)  algorithms  or 
formulations  by  some  mathematical  programming  tools  are  expected. 

In  what  follows,  first  we  discuss  the  exhaustive  search  for 
optimal  solutions,  and  thereafter  a  heuristic  algorithm  is  proposed. 
In  Procedure  D,  there  are  n!  possible  ways  to  select  a  sequence  of 
C     ■ s .   For  each  sequence  there  are  n  orthogonal  decompositions 
generated  excluding  the  trivial  decomposition  (l,«).   Then,  totally,  n  •  n! 
orthogonal  decompositions  can  be  generated  if  we  exhaust  all  possibilities. 
This  number,  however,  can  be  reduced  to  n  •  n!/2  since  there  is  no 
essential  difference  between  a>X'   and  p^1':  a  cr1''   being  one  of  the 
P^1'  and  a  p^  being  one  of  the  cfX\ 
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The  structure  of  the  exhaustive  algorithm  can  best  be  explained 
with  the  help  of  the  following  tree. 

START 


The  node  (pq...r)  at  the  i-th  level  of  the  tree  stands  for  T.j   con- 
structed by  C^1^  =  C  ,  (r  '  =  C  , ,,  and  c'1^  =  C  at  each  step  of 

p         q  r 

Procedure  D.   From  a  node  at  the  i-th  level,  (n-i)  edges,  showing 
possible  (n-i)  choices  of  C     ,  diverge;  and  each  of  them  connects 


to  a  node  at  the  (i+l)-th  level,  say,  from  the  node  (12)  of  the  second 


level  to  the  node  (123)  in  the  third  level  by  the  edge  C 


3' 


This  transition  is  shown  next  more  explicitly.   (We  note  that  a  de- 


generation occurs  in  T  . ) 
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,(2) 


,(3) 


We  learned  that  there  are  n  •  n!  possible  nodes  in  the  above 


tree.   Not  all  of  them,  however,  is  distinguished.  We  show  an  example, 

Corresponding  to  nodes  (21)  and  (l2j  ,  we  have  two  decision  trees  T-j 

(2)' 

and  T    shown  below,  respectively. 


(2) 


J2^    42) 


*2    "3 


.(2),     (2),    (2),     (2), 
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(2)      (2)' 
We  note  that  two  partitions  a    and  a         realized  by  these 

(2)  (2)'  (2)     (2)*    (2)     (2)' 

T    and  Tp    are  identical  and,  moreover,  it         x  ji*   ,  rt£  '  =  jr^  '    , 

(2)  (2)'      (2)     (2)'  (2) 

jC   =  TTp    and  *h   *=  *£    hold.   Then,  their  counterparts  p    and 

(2)' 
P     are  also  the  identical  partition.   In  other  words,  two  orthogonal 

(2)    (2)        (2) '    (2) ' 
decompositions  (a   ,  P   )  and  (a         ,  p    )  are  identical.   Further- 
more, any  selection  of  the  same  C    (k  =  3,4,...,  n-l)  for  developing 


(2)       (2)' 
both  T,    and  T     by  Procedure  D,  yields  the  same  decomposition 

(a       ,  P   )  for  k  =  ^,K,...,   n-l.   In  summary,  we  need  not  distinguish 
the  node  \21j   from  the  node  n.2J  .   Therefore,  we  let  them  merge  together 
in  the  tree  as  follows. 


©---© 
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(2)' 

However,  if  degeneration  occurs  for  T    ,  for  example,  we 

have  a  different  situation.   That  is,  the  following  two  decision  trees 

(2)  (2)' 

Tj.        and  T     should  be  distinguished  from  each  other,  because  the 

application  of  Procedure  D  to  these  two  trees  results  in  different 

decompositions.   Then,  their  corresponding  nodes  \2l)   and  \12J   should 

not  merge. 


,(2) 


'2)' 


How  many  mergings  of  nodes  occur  in  the  generation  tree  is  de- 
pendent of  a  given  partition  it.   If  no  degeneration  occurs,  as  an 
extremal  case,  at  each  step  of  Procedure  D  for  any  selected  sequence  of 


:*) 


then  the  generation  tree  degenerates  to  the  following  small  tree 


due  to  mergings  of  nodes.   There,  we  have  only  nodes  (I      £    ...  f. 


where  £  <£<...<  i.  at  the  i-th  level  for  i  =  1,2,...,  n. 


This 


means  that  there  are  exactly  (j_)  nodes  at  the  i-th  level  and  we  can 
conclude  that  this  number  of  nodes  totally  amounts  to  2   (?)  =  2n. 
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Instead  of  enumerating  all  n  •  n!  candidates  (this  is  another  extremal 
case),  we  usually  need  to  consider  a  number  of  decompositions  "between 
2n  and  n  •  n'  for  the  exhaustive  method. 


If  we  calculate  two  objective  functions  C  (a       ,  |3   )  and 
Cp(cr   ,  p   )  for  all  decompositions  that  correspond  to  nodes  of  this 
tree,  we  can  find  the  best  solution.  We  need,  however,  the  effort  to 
generate  all  n  •  n!  candidates.   Therefore,  we  sacrifice  guaranteeing 
the  best  solution  but  we  save  the  cost  of  the  work  which  would  have 
been  required  to  find  some  reasonable  or  suboptimal  solutions  for  both 
objective  functions. 

Based  on  Procedure  D,  Theorem  5.^-.,  and  Theorem  5«5«>we  discuss 
a  heuristic  algorithm  for  finding  suboptimal  orthogonal  decompositions. 


We  have  already  shown  the  following  facts 

,(i+X)  ,, 


For  any  selected  sequence  of  C 


(i  =  0,1,,,,  n-1), 


120 


,(i) 


l)  a         decreases  as  i  increases.   That  is, 

and, 


a(0)  =  l>a(1)>a(2>>... 


#(a   )  is  a  monotonically  increasing  function  of  i, 
#(a(0))  =0  <#(a(l))  <#(a(2))  <  ..., 


(Theorem  5.4.) 
however,  #(a    •  P   )  is  globally  decreasing,  i.e., 


2)   a         •  p    is  not  monotonic  (Theorem  5. 5.).  In  general, 


#(a(0)  •  p(0))  =#(!  •  *)  -#(*)  <#(a(l)  -  P(l))  < 


#<a<2'  V2))<..., 


,(i) 


3)   It  has  been  stated  explicitly,  but  #(P   )  is  generally 
decreasing,  i.e., 

#(p(0))  =  #(*)  >#(p(1))  >#(p(2))  >  ...  . 

(Since  any  orthogonal  decomposition  a  •    P  of  jt  must  satisfy 
a  •  (3  <  ti,  #(a  •  P)  >  #(«)  holds.  For  the  first  objective  function, 
therefore,  the  trivial  decomposition  (a   ,  P   )  =  (I,  it)  is  optimal 
because  C  (I,  Jt)  =  #(fl)  achieves  the  minimal  #(jt).   However,  this  de- 
composition is  not  reasonable  since  its  second  objective  function 
C  (I,  jt)  =  #(l)  +  #(jt)-2  is  unreasonably  large.  Therefore,  we  consider 
"suboptiinal"  as  "reasonably  good"  for  both  objective  functions,  C  (a,  P) 
and  C2(a,  p) . ) 

Now  we  have  two  criteria,  i.e.,  C  (a,  (3)  =  #(a)  ■  #(P)  and 
C2(a,  p)  =  #(a)  +  #(p)-2.   In  order  to  find  a  reasonable  solution  by 
applying  Procedure  D,  an  algorithm  has  to  show:  l)  which  condition 


i+1) 


should  be  chosen  at  each  step  and  2)  at  which  i  we  should  stop 


121 


Procedure  D.   The  first  requirement  corresponds  to  selecting  an  edge  of 
the  tree  (defined  before)  and  the  second  requirement  corresponds  to  know- 
ing at  which  level  of  that  tree  we  should  stop.  This  is  shown  in  the 
sketch  below. 


START 


STOP 


In  order  to  explain  these  points  more  simply,  we  assume  that 
C1(c/1'),  p^)  =  #(c/1'))  •  #(P^)  is  constant  for  any  i  and  for  any 
selected  sequence  of  C     .   Then  an  optimal  decomposition  for  the 
second  objective  function  CLCcr1',  P^O  =  #(0^0  +  #(P^)-2  can 
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be  found  around  the  i-th  step  such  that  #(cr  ')  =  #(a^  ')  is  attained. 
(Recall  that  x  +  y  is  minimized  at  x  =  y  if  x  •  y  is  constant  for  real 
numbers  x  and  y.)  This  strong  assumption  of  #(a   )  •  #(fr  ')  being 
constant  is  not  true  in  general,  but  if  #(a   )  •  #(P   )  is  gradually 
increasing  and  it  does  not  deviate  much  from  the  lower  bound  #(«), 
the  above  criterion,  #(or      )  =  #(p   ),  seems  reasonable. 

In  summary,  we  choose  a  condition  C      at  each  step  so  that 
its  resultant  Cx(a(i+1\  P(1+l))  =   #(a(i+l))  ■  #(P(l+l))  is  minimized 
over  all  possible  choices  of  conditions,  and  we  stop  this  algorithm 
around  i  such  that  #(a     )  =  #(p     )  is  achieved.   If  we  recall 
here  that  #((3   )  is  generally  decreasing  and  ff(of   )  is  increasing, 
we  can  realize  that  without  much  loss  we  reach  this  i  in  straightforward 
way.   Therefore,  this  method  to  search  for  suboptimal  solutions  is  not 
impractical. 

The  condition  f(oc       )  =  #(P   )  is  attained  around  i  such  that 
the  derivative  (gradient)  of  C  (a   ,  fv  ')  changes  from  negative  to 
positive.   (Recall  also  that  the  derivative  of  x  +  y  changes  from 
negative  to  positive  at  x  =  y  as  x  increases,  assuming  that  x  •  y  is 
constant.)   Therefore,  alternatively  we  can  state  that  we  stop  Procedure 
D  around  i  such  that  C2(cr1+1),  p'1^)  *  Q q(<X       s    P   )    is  attained. 

Now  we  give  the  complete  statement  of  the  algorithm. 


ALGORITHM 

At  each  step  of  Procedure  D,  choose  a  condition  C      such 
that  its  resultant  orthogonal  decomposition  (a   ,  f3   )  minimizes 
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Cjfo/14"1*,  P(i+1))  -  #(a(i+1))  •  #(P(i+l)).  Repeat  steps  from  i  =  0 
through  k  such  that  #(05     )  =  f(p  )  is  achieved,  or  alternatively, 

C2(a(k+1),  p(k+l))  5  C2(a(k),  f3(k))  is  attained. 

EXAMPLE  5-10. 


We  show  here  an  example  for  which  the  above  algorithm  works 
effectively.   Assume  the  following  5-cube  partition  it. 


>    C, 


#(*)  =  16 


(1) 


,(2) 


Then  our  algorithm  selects  Cv  J   =  C  ,  Cv   =  C,  and  C 


,(3) 


«  c . 


At 


i  =  1,  #(a(2))  a#(P(2))  is  attained.  And  C0(a(5),  p(5))  £  C0(a(2) ,  f3(2)) 


2X^     ,  r-     / 2 

at  i  =  2.   Therefore,  we  stop  the  procedure  at  i  =  1  or  2.  Both  are 

reasonably  good  decompositions.   In  Figure  |5.l8.,(Q!   ,  P  )  as  well  as 
(tJ1),  T^)  for  i  =  1,2,3  are  shown. 

J-       c 


12k 


i)    1=0,    c<1+1>  -c<l>-c 


ifV1)) 


#(a(1))  =  2 


c3 


C2\^C1 


(1) 


2)      i  -  I,      C(i+1)    =  C(2)    .  C 


JD,.^ 


#(e(1))  =  9 


e^1',  e(1)) 


18 
9 


J<aV8)) 


T(2)(p(2)} 


#(a(2))  =  It 


vl/c- 


(2) 


#(P(2)) 


=  5 


C^a^,  p(2))   =  20 
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TpV3))  |c 


Cl 

J- 


#(P(3))  =  3 


#(a(3))  =  7 


C^a^,   3(3))  =  21 
C2(o(3\  P(3))  -  8 

To  see  what  follows  if  we  continue  this  procedure,  we  show  below 
T    and  P    which  are  generated  by  choosing  C      =  C    =  CL. 


U)   i  .  3,   C(1+1)  =  C{k)    =  0, 


TfV") 


T<UV°> 


I     I 


CO 


#(a(U))  =  lU 


#0(U))  .  2 
0l(a(U),  p(U))  =28 
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APPENDIX LITERATURE  SURVEY 

Much  has  been  published  concerning  decision  tables,  decision  table 
languages  and  their  applications  in  various  areas.   The  following  books  can 
serve  as  as  introduction  to  this  subject:  Hughes  et  al.  [6],  Katzan  [9], 
McDaniel  [17],  [l8],  [19]  and  Pollack  et  al.  [2k],     Most  of  these  texts  cover 
introductory  material  through  some  specific  applications  and/or  decision 
table  languages.   Some  of  them  also  include  topics  on  decision  table  convert 
sion  problems.   A  good  summary  and  concise  survey  of  research  topics  in  this 
field  can  be  found  in  Katzan  [9]  and  King  [12]. 

Many  useful  discussions  concerning  decison  table  conversion  problems 
were  first  given  by  Montalbano  [20].   Egler  [2]  attempted  to  give  a  very 
simple  manual  method  for  converting  decision  tables  into  decison  trees.  He 
thought  that  it  minimizes  both  the  average  processing  time  and  the  storage 
requirement.  Montalbano  [21],  however,  refuted  Egler rs  algorithm  by  showing 
a  counterexample.   Pollack  [23]  proposed  two  plausible  procedures:  One  for 
minimizing  the  storage  requirement  and  the  other  for  the  average  processing 
time.   He  asked  readers  to  prove  his  algorithms  or  to  offer  counterexamples 
showing  his  algorithm  fail.   There  is  a  counterexample  in  Sprague  [3I]  which 
shows  that  neither  algorithm  guarantees  optimality  for  its  respective  objec- 
tive.  By  introducing  the  concept  of  entropy  in  information  theory,  Schwayder 
[28]  modified  Pollack's  algorithms  but  it  is  known  that  the  algorithm  does 
not  always  generate  an  optimal  tree  either.  Earlier,  before  Pollack's  [25] 
appeared,  Press  [25]  gave  another  simple  manual  procedure  as  well  as  an 
interesing  discussion  of  decision  table  languages. 
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None  of  the  methods  discussed  above  generates  optimal  trees  in 
all  cases.   On  the  other  hand,  Reinwald  and  Soland  attack  the  conversion 
problems  by  a  branch  and  bound  method  in  their  papers  [26],  [27].   Their 
algorithms  guarantee  to  produce  optimal  trees,  but  they  are  fairly  complex 
and  time  consuming. 

Recent  work  of  Alster  [l]  gives  an  attempt  to  extend  decision 
table  conversion  problems  into  a  more  generalized  decision  tree  construction 
problem.   It  deals  with  constructions  of  optimal  decision  tree  not  only  for 
rule  partitions  but  also  for  general  partitions.   Several  heuristic  algorithms 
to  minimize  the  number  of  internal  nodes  of  the  trees  and  their  results  by 
computer  implementation  are  described. 

Another  decision  tree  construction  problem,  called  a  binary 
identification  problem  by  the  author,  can  be  found  in  Garey  [3].   The  decision 
table  considered  do  not  have  "-"  (dash)  entries,  which  means  that  the  correr- 
sponding  cube  partitions  are  of  the  following  special  type:  For  a  decision 
table  with  m  rule  columns  and  n  condition  rows,  the  corresponding  partition 
consists  of  m  O-cubes  (each  corresponding  to  a  rule)  and  one  block  of  (2  -  m) 
O-cubes  (the  Else-rule).   Garey' s  approach  is  similar  to  the  well  known 
optimal  binary  search  tree  constructions  (Knuth  [15]  and  Hu  and  Tucker  [7l)« 
His  main  discussions  are  how  the  exhaustive  algorithm  which  he  describes  can 
be  improved.   He  finds  some  specific  relationships  among  probabilities  of 
occurence  of  rules  and/or  costs  of  conditions,  which  reduce  the  amount  of 
work  if  they  are  met. 

Also  to  be  mentioned  here  is  an  earlier  book  by  Picard  [22],  which 
contains  a  number  of  results  about  general  decision  trees,  usually  of  the  type 


that  under  certain  conditions  a  tree  of  a  certain  structure  is  optimal. 

There  are  also  similarities  between  the  construction  of  decision 
trees  and  relay  network  realizations  of  Boolean  functions  (see,  e.g., 
Harrison  [h]) .     Since  the  role  of  a  transfer  relay  in  a  network  and  of  a 
decision  box  in  a  decision  tree  is  the  same,  transfer  relay  realizations  of 
Boolean  functions  are  apparently  a  special  case  of  decision  tree  construc- 
tions.  In  Marcus  [l6],  an  algorithm  to  realize  a  Boolean  function  with  a 
small  number  of  transfer  relays,  using  Karnaugh  map  techniques,  is  proposed. 
We  know  that  the  iterated  local  minimization  in  this  thesis  is  a  generaliza- 
tion of  his  method.  Also,  a  correspondence  between  decision  tree  construc- 
tions and  transfer  relay  realizations  of  Boolean  functions  is  described  by 
Seshagiri  [293-   The  objective  of  these  authors  is  to  reduce  the  number  of 
relays  used  in  realizations  of  Boolean  functions.   There  is  no  such  a  concept 
as  an  average  processing  time  in  switching  theory,  since  all  relays  require 
the  same  period  for  thir  operation.  Slagle's  work  [30]  is  more  close  to  our 
subject.  He  discusses  an  effective  binary  decision  tree  construction  for  a 
given  Boolean  expression. 

In  this  thesis  we  discussed  only  methods  for  converting  decision 
tables  into  decision  trees.   There  are,  however,  other  fundamentally  different 
approaches  to  processing  tables  by  computers.  Kirk  [13]  and  King  [10] 
proposed  and  developed  the  use  of  mask  matrix  techniques,  respectively. 
Veinott  [32]  also  shows  a  programming  technique  to  interpret  tables  into 
computer  programs.   These  methods,  however,  need  the  evaluation  of  all  condi- 
tions for  each  input  datum  and  it  is  obviously  wasteful  of  execution  time.. 
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Finally  we  mention  two  papers,  King  [11]  and  Press  [25],  for 
discussions  of  ambiguity  and  redundancy  problems  of  decision  tables. 
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